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THE CRITICAL INCIDENT TECHNIQUE 
JOHN C. FLANAGAN 
American Institute for Research and University of Pitisburgh 


During the past ten years the 
writer and various collaborators have 
been engaged in developing and uti- 
lizing a method that has been named 
the “critical incident technique.” It 
is the purpose of this article to de- 
scribe the development of this meth- 
odology, its fundamental principles, 
and its present status. In addition, 
the findings of a considerable number 
of studies making use of the critical 
incident technique will be briefly re- 
viewed and certain possible further 
uses of the technique will be indi- 
cated. 

The critical incident technique con- 
sists of a set of procedures for col- 
lecting direct observations of human 
behavior in such a way as to facili- 
tate their potential usefulness in solv- 
ing practical problems and develop- 
ing broad psychological principles. 
The critical incident technique out- 
lines procedures for collecting ob- 
served incidents having special signifi- 
and meeting 

criteria. 


cance 
defined 

sy an incident is meant any ob- 
servable human activity that is suf- 
ficiently complete in itself to permit 


systematically 


inferences and predictions to be made 
about the person performing the act. 
To be critical, an incident must occur 
in a situation where the purpose or 
intent of the act seems fairly clear to 
the and 
quences are sufficiently definite to 


observer where its conse- 


leave little doubt concerning its 
effects. 

Certainly in its broad outlines and 
basic approach the critical incident 
technique has very little which is 
new about it. People have been mak- 
ing observations on other people for 
centuries. The work of many of the 
great writers of the past indicates 
that they were keen observers of their 
fellow men. Some of these writers 
must have relied on detailed notes 
made from their observations. Others 
may have had unusual abilities to 
reconstruct memory images in vivid 
detail. Some may have even made a 
series of relatively systematic obser- 
vations on many instances of a par- 
ticular type of behavior. Perhaps 
what is most conspicuously needed to 
supplement these activities is a set 
of procedures for analyzing and syn- 
thesizing such observations into a 
number of relationships that can be 
tested by making additional observa- 
tions under more carefully controlled 
conditions. 


BACKGROUND AND EARLY 
DEVELOPMENTS 


The roots of the present procedures 
be traced back directly to the 
studies of Sir Francis Galton nearly 
70 years ago, and to later develop- 
ments such as time sampling studies 
of recreational activities, controlled 
observation tests, and anecdotal rec- 
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can 
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ords. The critical incident technique 
as such, however, can best be re- 
garded as an outgrowth of studies in 
the Aviation Psychology Program of 
the United States Army Air Forces 
in World War II. The Aviation Psy- 
chology Program was established in 
the summer of 1941 to develop pro- 
cedures for the selection and classifi- 
cation of aircrews. 

One of the first studies (40) carried 
out in this program was the analysis 
of the specific reasons for failure in 
learning to fly that were reported for 
1,000 pilot candidates eliminated 
from flight training schools in the 
summer and early fall of 1941. The 
basic source used in this analysis was 
the proceedings of the elimination 
boards. In these proceedings the 
pilot instructors and check pilots 
reported their reasons for eliminating 
the particular pilot. It was found 
that many of the reasons given were 
clichés and stereotypes such as “‘lack 
of inherent flying ability” ‘in- 
adequate sense of sustentation,”’ or 
generalizations such as “‘‘unsuitable 
temperament,” “‘poor judgment,”’ or 
“insufficient However, 


and 


progress.” 


along with these a number of specific 
observations of particular behaviors 


were reported. This study provided 
the basis for the research program on 
selecting pilots. Although it was 
found very useful, it also indicated 
very clearly the need for better pro- 
cedures for obtaining a representative 
sample of factual incidents regarding 
pilot performance. 

A second study (13), which empha- 
sized the importance of factual re- 
ports on performance made by com- 
petent observers, was carried out in 
the winter of 1943-1944 in the 8th, 
9th, 12th, and 15th Air Forces. This 
study collected the reasons for the 
failures of bombing missions as re- 
ported in the Group Mission Reports. 
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Although in the preparation of these 
reports much greater emphasis was 
given to determining the precise facts 
in the case, it was apparent that in 
many instances the official reports 
did not provide a complete record of 
all the important events. Even with 
limitations, the information 
given was found to be of considerable 
value, and the systematic tabulations 


these 


that were prepared provided the basis 
for a series of recommendations that 
resulted in important changes in Air 
Force selection and training proce- 
dures. 

In the summer of 1944 a series of 
studies (74) was planned on the prob- 
lem of combat leadership in the 
United States Army Aijr_ Forces. 
These represent the first large-scale, 
systematic effort to gather specifi 
incidents of effective or ineffective 
behavior with respect to a designated 
activity. The instructions asked the 
combat veterans to report incidents 
observed by them that involved be- 
havior which was especially helpful 
or inadequate in accomplishing the 
assigned mission. ‘The statement 
finished with the request, ‘‘ Describe 
the officer’s action. What did he do?” 
Several thousand incidents were col- 
lected in this way and analyzed to 
provide a relatively objective and 
factual definition of effective combat 
leadership. The resulting set of de- 
scriptive categories was called the 
critical combat 
leadership. 

Another study (74) conducted in 
the Aviation Psychology Program in- 
survey of disorientation 
Disorientation in this 


requirements” of 


volved a 
while filving.' 


rhis study was planned by Paul M. Fitts, 
Jr., who also contributed to the previously 
mentioned USAAF studies and planned and 
carried out the interview study with pilots de 
scribed below on the design of instruments, 
controls, and arrangements. 





THE CRITICAL INCIDENT TECHNIQUE 


study was defined to include any ex- 
perience denoting uncertainty as to 
one’s spatial position in relation to 
the vertical. In this study pilots 
returning from combat were asked 
“to think of some occasion during 
combat flying in which you personal- 
ly experienced feelings of acute dis- 
orientation or strong vertigo.”’ They 
were then asked to describe what they 
‘saw, heard, or felt that brought on 
the experience.”’ This study led to a 
number of recommendations regard- 
ing changes in cockpit and instru- 
ment panel design and in training in 
order to overcome and prevent verti- 
go while flving. 

In a project carried out in the Avia- 
tion Program in 1946, 
Fitts and Jones (12) collected descrip- 
tions of 


Psychology 


experiences from 
pilots in taking off, flying on instru- 


specific 


ments, landing, using controls, and 
using instruments. These interviews 
with pilots were electrically recorded. 
They provided many factual inci- 
dents that 
planning research on the design of 
instruments and and the 
arrangement of these within the cock- 
pit. 

In addition to 


were used as a basis for 


controls 


the collection of 
specific incidents and the formulation 
of critical requirements, as outlined 
above, the summary volume (13) for 
the Aviation Program 
Research Reports contained a dis- 
the theoretical basis of 
procedures for obtaining the critical 
requirements of a particular activity. 
Perhaps the best method of describ- 
ing the status of these procedures at 
the close of the war is to quote from 
the discussion in this summary vol- 
ume, which was written in the late 
spring of 1946. In the 
techniques for defining job require- 


Psvchok vy 


cussion ol 


section on 


ments, the present author wrote as 
follows: 
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The principal objective of job analysis pro- 
cedures should be the determination of critical 
requirements. These requirements include 
those which have been demonstrated to have 
made the difference between success aid fail- 
ure in carrying out an important part of the 
job assigned in a significant number of in 
stances. Too often, statements regarding job 
requirements are merely lists of all the desira- 
ble traits of human beings. 
tically no help in selecting, classifying, or 
training individuals for specific jobs. To ob 
tain valid information regarding the truly 
critical requirements for success in a specific 
assignment, procedures were developed in the 
Aviation Psychology Program for making sys 
tematic analyses of causes of good and poor 
performance. 

Essentially, the procedure was to obtain 


These are prac 


first-hand reports, or reports from objective 
records, of satisfactory and unsatisfactory exe 
cution of the task assigned. The cooperating 
individual described a situation in which suc 

cess or failure was determined by specific re 
ported causes. 

This procedure was found very etfective in 
obtaining information from individuals con 
cerning their own errors, from subordinates 
concerning errors of their superiors, from su 
pervisors with respect to their subordinates, 
and also from participants with respect to co- 
participants (13, pp. 273-274). 


DEVELOPMENTAL STUDIES AT 
THE AMERICAN INSTITUTE 
FOR RESEARCH 


At the close of World War I] some 
of the psychologists who had_ par- 
ticipated in the USAAF Aviation 
Psychology Program established the 
American Institute for Research, a 
nonprofit’ scientific and educational 
organization. The aim of this organ- 
ization is the systematic study of 
human behavior through a coordi- 
nated program of scientific research 


that follows the same general princi- 
ples developed in the Aviation Psy- 


chology Program. It was in connec- 
tion with the first two studies under- 
taken by the Institute in the spring 
of 1947 that the critical incident 
technique was more formally de- 
veloped and given its present name, 

These studies were natural exten- 
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sions of the previous research in the 
Aviation Psychology Program. The 
study reported by Preston (52) dealt 
with the determination of the critical 
requirements for the work of an officer 
in the United States Air Force. In 
this study, many of the procedural 
were first subjected to 
tryout and evaluation. 
Six hundred and forty officers were 
interviewed, and a total of 3,029 crit- 
ical incidents were This 
led to the development of a set of 58 
critical requirements classified into 
six Major areas. The second study, 
reported by Gordon (27, 28), was 
carried out to determine the critical 
requirements of a commercial airline 
In this study, several different 
used to establish the 
critical requirements of the airline 
These included training rec- 


problems 
systematic 


obtained. 


pilot. 


sources were 


pilot. 


ords, flight check records including 
the specific comments of check pilots, 
critical pilot behaviors reported in 


accident records, and critical inci- 
dents reported anonymously in in- 
terviews by the pilots themselves. 
From this study, 733 critical pilot 
behaviors were classified into 24 criti- 
cal requirements of the airline pilot's 
job. These were used to develop selec- 
tion tests to measure the aptitudes 
and other personality characteristics 
found critical for success in the job. 
They also provided the basic data 
for the formulation of an objective 
flight check to determine the eligi- 
bility of applicants for the airline 
transport rating. 

The third application of the critical 
incident technique by the staff of the 
American Institute for Research was 
in obtaining the critical requirements 
for research personnel on a project 
sponsored by the Psychological Sci- 
ences Division of the Office of Naval 
Research. In this study (20), about 
500 scientists in 20 research labora- 
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tories were interviewed. These sci- 
entists reported more than 2,500 
critical incidents. The critical be- 
haviors were used to formulate induc- 
tively a set of 36 categories, which 
constitutes the critical requirements 
for the effective performance of the 
duties of research personnel in the 
physical sciences. This initial study 
provided the basis for the develop- 
ment of selection tests, proficiency 
measures, and procedures for evaluat- 
ing both job performance and the 
research report. 

Another project undertaken by the 
American Institute for Research in 
the spring of 1948 provided valuable 
experience with the critical incident 
technique. This study, reported by 
Nagay (48), was done for the Civil 
Aeronautics Administration under 
the sponsorship of the Committee on 
Aviation Psychology of the National 
Research Council. It was concerned 
with the air route traffic controller's 
job. One of the innovations in this 
study was the use of personnel of the 
Civil Aeronautics Administration 
who had no previous psychological 
training in collecting critical inci- 
dents by means of personal inter- 
views. In previous studies all such 
interviewing had been conducted by 
psychologists with extensive training 
in such procedures. In this study, 
aeronautical specialists from each of 
the seven regions conducted the inter- 
views in their regions after a_ brief 
training period. An interesting find- 
ing from this study was the clear re- 
flection of seasonal variations in fly- 
ing conditions in the types of inci- 
dents reported. The study 
demonstrated the selective recall of 
dramatic or other special types of 
incidents. This bias was especially 
noticeable in the incidents reported 

months after their occur- 
The incidents obtained in this 


also 


several 
rence. 
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study were used to develop proce- 
dures for evaluating the proficiency of 
air route traffic controllers and also 
for developing a battery of selection 
tests for this type of personnel. 

In the spring of 1949 the American 
Institute for Research undertook a 
study to determine the critical job 
requirements for the hourly wage 
employees in the Delco-Remy Divi- 
sion of the General Motors Corpora- 
tion. This study, reported by Miller 
and Flanagan (46), was the first 
application of these techniques in an 
industrial situation. Foremen who 
were members of a committee ap- 
pointed to develop employee evalua- 
tion procedures collected 2,500 criti- 
cal incidents in interviews with the 
other foremen in the plants. On the 
basis of these data a form was pre- 
pared for collecting incidents on a 
day-to-day continuous 
record of job performance. 

Using this form, the Performance 


basis as a 


Record for Hourly Wage Employees 
(21), three groups of foremen kept 
records on the performance of their 
employees for a two-week period. A 
group of 24 foremen recorded inci- 
dents daily; another group of 24 


foremen reported incidents at the 
end of each week; and a third group 
containing the same number of fore- 
men reported incidents only at the 
end of the two-week period. The 
three groups of foremen represented 
comparable conditions of work and 
supervision. The foremen reporting 
daily reported 315 critical incidents; 
the foremen reporting weekly, 155 
incidents; and the foremen reporting 
only once at the end of two weeks 
reported 63 incidents. Thus, foremen 
who reported only at the end of the 
week had forgotten approximately 
one half of the incidents they would 
have reported under a daily reporting 
plan. The foremen who reported 
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only at the end of the two-week peri- 
od appeared to have forgotten 80 
per cent of the incidents observed. 
Although it is possible that the find- 
ings may be partially attributed to 
the fact that the foremen making 
daily records actually observed more 
critical incidents because of the daily 
reminder at the time of recording, it 
is clear that much better results can 
be expected when daily recording is 
used. 

Another based on data 
collected at the Deleo-Remy Division 
compared the number of critical in- 
cidents of various obtained 
from interviews with those recorded 
daily by the foremen on the perform- 
ance record. Although there were 
some differences in the relative fre- 
quencies for specific categories, the 
general patterns appeared to be quite 
similar. These results suggest that 
critical from in- 
terviews can be relied on to provide 
a relatively accurate account of job 


analysis 


types 


incidents obtained 


performance if suitable precautions 
are taken to prevent systematic bias. 

In addition to the development of 
the performance record described 
above, the critical incidents collected 
in this study were used as the basis 
for constructing selection tests cover- 
ing both aptitude (18) and attitude 
(2) factors. 


STUDIES CARRIED OUT AT THE 
UNIVERSITY OF PITTSBURGH 


A substantial number of studies 
have been carried out in the depart- 
ment of psychology at the University 
of Pittsburgh by students working 
for advanced degrees under the au- 
thor’s direction. Most of these studies 
had as their objective the determina- 
tion of the critical requirements for 
a specific occupational group or ac- 
tivity. Many of them also included 
contributions to technique. In 1949 
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Wagner (66) completed a dissertation 
on the critical requirements for den- 
tists. In this study, critical incidents 


were obtained from three sources: 


patients, dentists, and dental school 


instructors. The incidents were clas- 


sified into four main aspects of the 


dentist's job: (a) demonstrating 


technical proficiency; (4) handling 


patient relationships; (c) accepting 
and (d) 


accepting personal responsibility. As 


professional responsibility; 


might be expected, the patients did 
not report as large a proportion of 
incidents for demonstrating technical 
proficiency or accepting professional 
responsibility as did the other two 
groups, and the instructors reported 
only a relatively small proportion of 
their incidents in the area of handling 
patient relationships. 

On the basis of the findings from 
this study, a battery of selection tests 
was developed for use by the Uni- 
versity of Pittsburgh School of Den- 
tistry. A number of proficiency tests 
ability 
to certain of the critical requirements 


for measuring with respect 
were also developed using these re- 
sults as a basis. 

Another dissertation completed in 
1949 was Finkle’s (11) study of the 
critical 


requirements of industrial 


This study was conducted 
in the East Pittsburgh plant of the 
Electri 
Critical incidents were obtained from 
foremen, general foremen, and staff 
personnel, 


foremen. 


Westinghouse Corporation. 


A number of points per- 
taining to technique were studied. 
One finding was in reference to the 
effect on the types of incidents ob- 
tained of the degree of importance 
or exceptionalness set up as a criterion 
ignoring incidents. 
The incidents obtained from the use 
of questions that asked for incidents 


lor reporting or 


only shiehtly removed from the norm 


were compared with incidents ob- 
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tained from questions intended to 
elicit more definitely effective or in- 
effective behaviors. Some examples 
of these questions are: 


1. Think of a time when a foreman has done 
something that you felt should be encouraged 
because it seemed to be in your opinion an ex- 
ample of good foremanship (Effective 
slight deviation from norm 


2. Think of a 


that you 


time when a foreman did 
thought 


slight 


omething was not up 


to par. (Ineffective deviation from 
norm.) 

3. Think of a time when a foreman has, in 
your opinion, shown definitely good foreman 
ship—the type of action that points out the 
superior foreman. (i ffective—substantial de 
viation from the norm.) 

+. Think of a time when a foreman has, in 
your opinion, shown poor foremanship—the 
sort of action which if repeated would indicate 
that the man was not an effective foreman. 


(Inetfective—substantial deviation from 


norm.) 


The frequencies of incidents ob- 
tained in each of the 40 categories 
the effective 


classified were compared for 


into which behaviors 
were 
the questions requesting slight and 
substantial deviations from the norm, 
and the significance of the differences 
the chi- 

the differences 
were significant at the 1 cent 
he 5 per cent level. 


was tested by means of 


square test. ‘Two ol 
per 
level and one at t 
Comparisons of the frequencies in 
each of the 40 categories for ineffec- 
tive incidents failed to reveal any 
chi squares significant at either the 
5 per cent or the 1 per cent level. 
The 
slight 
sulted in 


involving only a 
from the 


effective 


questions 
deviation norm re- 
more incidents 
concerned with gaining the respect 
and loyalty of the workers and also 
in more that 
making, encouraging, and accepting 
They produced signifi- 
cantly fewer incidents regarding fit- 
ting men to jobs. The small number 


incidents involved 


suggestions. 


of significant differences—-only three 


in 80 comparisons—suggests that the 
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incidents obtained are not 
very greatly changed by variations 


types ol 


in wording of the questions compa- 


rable to those shown above. It seems 


likely that this is at least partially 
due to the fact that the persons in- 


terviewed report only incidents that 
fairly substantial devia- 
the norm 
the precise wording of the question 
asked. 

Another comparison made in this 
study related to the influence of ask- 
ing for an effective or an ineffective 


represent a 


tion from regardless of 


incident first. About 10 per cent more 
incidents were obtained from book- 
lets requesting effective incidents first 
than from 
effective incidents first. 


booklets requesting in- 
This differ- 
that 
attributed to 
chance sampling fluctuations. 

The this 
study were used, along with other 


ence was sufficiently small so 


it could reasonably be 
incidents collected in 
data, in the preparation of a Perform- 
ance Record tor Foremen and Super- 
visors (23). 

A study was conducted by 
(50) on the critical requirements of 


Nevins 


bookk«e epers in sale s companies. She 
collected incidents relating to appli- 
cants for bookkeeping positions as 
well as for emplovees working in this 
capac ity. 

For the collection of the informa- 
tion about the practicing bookkeep- 
ers, a modification in the critical inci- 
dent technique was made. This was 


done because, in the bookkeeping 
failure 


usually defined in terms of persistent 


profession, success and are 


behavioral patterns. Occasional mis- 
takes in adding and 
counts are expected, but 


balancing ac- 
repeated 
Instead 

incident, therefore, 
items included 


errors are considere d serious. 
of the 
many ol 


single 
the 
sented either a pattern of behaviors or 


repre- 


a series of similar behaviors. 


INCIDENT TECHNIQUE 


Weislogel (72) determined the crit- 
ical requirements for life insurance 
agency heads. A principal feature 
of his study related to the compari- 
son of two types of agency heads 
managers and general agents. It was 
believed that the critical behaviors 
for one type of agency head might 
provide a different pattern than that 
obtained for the other. This hypothe- 
sis was not confirmed by the analysis 
The pat- 
terns of critical requirements were 


of the obtained incidents. 


found to be quite similar for the two 
types of administrators. 

Smit (58) carried out a study to 
determine the critical requirements 
for instructors of general psychology 
Perhaps the finding of most 
general importance in this study was 
the substantial differ- 
ences between the patterns of critical 
incidents reported by students and 
faculty. The faculty reported a signif- 
icantly larger percentage of effective 
behaviors in the following 
demonstrations or 


courses. 


existence of 


areas: 
giving experi- 
ments, using discussion group tech- 
niques, encouraging and ascertaining 
students’ ideas and opinions. 

The students, on the other hand, 
larger 
behaviors in the following areas: re- 


contributed a percentage of 


viewing examinations, distributing 
grades, and explaining grades; using 
lecture aids such as drawings, charts, 
movies, models, and apparatus; using 
project techniques; giving test ques- 
tions on assigned material; helping 
students after class and during class 
recess; the manner of the instructor. 

The faculty reported a larger per- 
centage of ineffective behaviors con- 
cerning maintaining order. The in- 
effective behaviors that were reported 
in a larger percentage by students in- 
volved 


these presenting re- 


quirements of the course, using effec- 


areas: 


tive methods of expression, dealing 
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with students’ questions, pointing 
out fallacies, reviewing and summa- 
rizing basic facts and principles, using 
project techniques, using verbal diag- 
nostic teaching techniques, achieve- 
ment testing students on assigned 
material, objective type achievement 
testing, using humor. 

This is a good illustration of the 
problem of the competence of various 
types of available observers to evalu- 
ate the contribution to the general 
aim of the activity of a specific action. 
Examination of the reports from stu- 
dents indicated a somewhat limited 
Apparently 
one of the principal reasons for this 
was the lack of perspective on the 
part of the students and their inabil- 
ity to keep the general aim of the 
instructor clearly in mind because of 
its divergence from their im- 
mediate aims. In many this 
latter aim seemed to be directed 
toward achieving a satisfactory grade 
in the course, 

Kilbert (7) developed a functional 
description of emotional immaturity. 
The contributors of critical incidents 


sphere ol competence. 


own 


cases, 


included psychiatrists, psychologists, 
psychiatric social workers, occupa- 
tional therapists, nurses, and corps- 
men from a military hospital, plus 
13 psychologists in nonmilitary or- 


ganizations. The subjects of the in- 
cidents were primarily patients under 
psychiatric care. 

The contributors were given a form 
that oriented them to the concept 
“emotional immaturity’ by sug- 
gesting that it was revealed generally 
by childlike modes of behavior. The 
questions elicit incidents 
were: Have you recently thought of 
someone as emotionally im- 
mature (regardless of diagnosis)? 
What specifically happened that gave 
you this impression? What would 
have been a more mature reaction to 
the same situation? 


used - *o0 


being 
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Because of the indefinite nature of 
the concept, it was felt that a check 
should be made on the contributor’s 
understanding of his task. Twenty of 
the participating persons were asked 
to summarize briefly their interpreta- 
tion of what they had been asked to 
do. This appeared to be very useful 
in developing the phrasing of the 
questions so that they were uniformly 
interpreted by the various observers. 

The author of the study classified 
all the immaturities on the basis of a 
classification system developed from 
preliminary categorizations prepared 
by six of the contributors. This clas- 
sification was submitted to 14 psychi- 
atrists for review. They were asked 
to indicate which of the categories 
they were willing to accept as a type 
of immaturity as the term had been 
defined in an official document. 
More than half the categories were 
accepted by at least 13 of the 14 
judges, and none was rejected by more 
than 50 per cent of the judges. It was 
felt then that the system was accept- 
able. 

This study illustrates the applica- 
tion of the critical incident technique 
to the study of personality. It is 
believed that this study provides an 
excellent example of the possibilities 
for developing more specific behavi- 
oral descriptions. 

Folley (24) reported on the critical 
requirements of sales clerks in depart- 
ment stores. The behaviors were 
abstracted from narrative records of 
individual shopping incidents written 
by shoppers who were relatively in- 
experienced in evaluating sales per- 
sonnel. For various reasons, including 
the competence of the observers, 
their training, and their limited point 
of view, the resulting description 
must be regarded as only partial. 

In the past few years, many other 
individuals and groups have made 
use of the techniques described 
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above, or modifications of them, in a 
wide variety of studies. Some of these 
studies on which reports are being 
published will be reviewed briefly in 
the section on applications. 


THE PROCEDURE IN ITS 
PRESENT FORM 


From the foregoing discussion, it is 
clear that the critical incident tech- 
nique is essentially a procedure for 
gathering certain important facts 
concerning behavior in defined situa- 
tions. It should be emphasized that 
the critical incident technique does 
not consist of a single rigid set of 
rules governing such data collection. 
Rather it should be thought of as a 
flexible set of principles which must 
be modified and adapted to meet the 
specific situation at hand. 

The essence of the technique is 
that only simple types of judgments 
are required of the observer, reports 
from only qualified observers are 
included, and all observations are 
evaluated by the observer in terms of 
an agreed upon statement of the pur- 
pose of the activity. Of course, sim- 
plicity of judgments is a relative 
matter. The extent to which a re- 
ported observation can be accepted 
as a fact depends primarily on the 
objectivity of this observation. By 
objectivity is meant the tendency for 
a number of independent observers 
to make the same report. Judgments 
that two things have the same effect 
or that one has more or less effect 
than the other with respect to some 
defined purpose or goal represent the 
simplest types of judgments that can 
be made. The accuracy and therefore 
the objectivity of the judgments de- 
pend on the precision with which the 
characteristic has been defined and 
the competence of the observer in 
interpreting this definition with rela- 
tion to the incident observed. In this 
latter process, certain more difficult 
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types of judgments are required re- 
garding the relevance of various con- 
ditions and actions on the observed 
success in attaining the defined pur- 
pose for this activity. 

It is believed that a fair degree of 
success has been achieved in develop- 
ing procedures that will be of assist- 
ance in gathering facts in a rather 
objective fashion with only a mini- 
mum of inferences and interpreta- 
tions of a more subjective nature. 
With respect to two other steps that 
are essential if these incidents are to 
be of value a comparable degree of 
objectivity has not yet been ob- 
tained. In both instances, the sub- 
jective factors seem clearly due to 
current deficiencies in psychological 
knowledge. 

The first of these two othe: 


consists of 


steps 
the 
In the absence ol 
an adequate theory of human _ be- 
havior, this step is usually an in- 
ductive one and is relatively subjec- 
tive. Once a classification system has 


the classification of 
critical incidents. 


been developed for any given type of 
critical incidents, a fairly satisfactory 
degree of objectivity can be achieved 
in placing the incidents in the defined 
categories. 

The second step refers to inferences 
regarding practical procedures for 
improving performance based on the 
observed Again, in our 
present stage of psychological knowl- 
edge, we are rarely able to deduce or 
predict with a high degree of con- 
fidence the effects of specific selection, 


incidents. 


training, or operating procedures on 
future behaviors of the type observed. 
The incidents must be studied in the 
light of relevant established principles 
of human behavior and of the known 
facts regarding background factors 
and conditions operating in the spe- 
cific situation. From this total pic- 
ture hypotheses are formulated. In 
only a few types of activities are there 
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both sufficient established principles 
and sufficient information regarding 
the effective factors in the situation 
to provide a high degree of confidence 
in the resulting hypotheses regarding 
specific procedures for improving the 
effectiveness of the results. 

In the sections which follow, the 
five main steps included in the pres- 
ent form of the procedures will be 
described briefly. In order to pro- 
vide the worker with maximum flexi- 
bility at the present stage, in addition 
to examples of present best practice, 
the underlying principles for the step 
will be discussed and also the chietl 
limitations with, wherever possible, 
suggestions for studies that may re- 
sult in future improvements in the 
methods. 


1. General Aims 


A basic condition necessary for any 
work on the formulation of a func- 
tional description of an activity is a 
fundamental orientation in terms ot 
the general aims of the activity. No 
planning and no evaluation of specifi 
behaviors are possible without a 
general statement of objectives. The 
trend in the held 


operational has led a 


scientifi toward 
statements 
number of writers to tr: to describe 
activities or functions in terms of the 
the 


materials acted on, the situations in- 


acts or operations performed, 


volved, the results or products, and 
the relative importance of various 
acts and results. These‘analyses have 
been helpful in emphasizing the need 
for more specific and detailed de- 
scriptions of the requirements of ac- 
tivities. Typically, however, such 
discussions have failed to emphasize 
the dominant role of the general aim 
in formulating a description of suc- 
cessful behavior or adjustment in a 
particular situation. 

In its simplest form, the functional 
description of an activity 


spc ifye < 
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precisely what it is necessary to do 
and not to do if participation in the 
activity is to be judged successful or 
effective. 
report that a person has been either 
effective or ineffective in a particular 
activity by performi: ga specific act 
unless we know what he is expected 
to accomplish. For example, a super- 
visor’s action in key 
worker for a half a day to participate 
in a recreational activity might be 
effective if the 
general aim of the foreman was to get 


It is clearly impossible to 


releasing a 


evaluated as very 
along well with the employees under 
him. On the other hand, this same 
action might be evaluated as ineffec- 
tive if the primary general aim is the 
immediate production of materials or 
services, 

In the case of the usual vocational 
activities the supervisors can be ex- 
pected to supply this orientation. In 
certain other types of activities, such 
as civic, social, and recreational ac- 
tivities, there frequently is no super- 
visor. The objectives of participation 
in the activity must then be deter- 
mined from the participants them- 
selves. In some instances, these may 
not be verbalized to a sufficient ex- 
tent to make it possible to obtain 
them directly. 

Unfortunately, in most situations 
there is no one general aim which is 
the correct one. Similarly, there is 
rarely one person or group of persons 
who constitute an absolute, authori- 
tative source on the general aim of 
the activity. In a typical manufac- 
turing organization the foreman, the 
plant manager, the president, and the 
stockholders might define the general 
aim of the workers in a particular 
section somewhat differently. It is 
not possible to sav that one of these 
groups knows the correct general aim 
and the others are wrong. This does 
not mean that one general aim is as 
good as another and that it is unim- 
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portant how we define the purpose of 
the activity. It does mean that we 
cannot hope to geta completely ob- 
jective and acceptable general aim for 
a specific activity. The principal 
criterion in formulating procedures 
for establishing the general aim of the 
activity should be the proposed use 
of the functional description of the 
activity which is being formulated. 
Unless the general aim used is accept 

able to the potential users of the de- 
tailed statement of requirements, the 
whole effort in formulating this state- 
ment will have been wasted. 

The most useful statements of aims 
seem to center around some simple 
phrase or catchword which is slogan- 
like in character. Such words provide 
a maximum of communication with 
only a minimum of possible misinter- 
pretation. Such words as “apprecia- 
tion,” “efficiency,”’ “development,”’ 


“service 


“a 


and are 
likely to be prominent in statements 
of general aims. For example, the 
general aim of a teacher in elermei 
tary school art classes might be the 
development of 


““production,”’ 


an appreciation of 
various visual art forms on the part 
of the students. The general aim of 
the good citizen might be taken as 
effective participation in the develop- 
ment and application of the rules and 
procedures by which individuals and 
groups are assisted in achieving their 
various goals. 

With the aid of a form of the type 
shown in ‘ig. 1, the ideas of a num- 
ber of well-qualified authorities can 
be collected. 
response to the question on the pri- 


It is expected that in 


mary purpose of the activity many 
persons will give a fairly lengthy and 
detailed statement. The request to 
summarize is expected to get them to 
condense into a brief usable 
statement. These should be pooled 
and a trial form of the statement of 
This state- 


this 


general aim developed 


ment should be referred either to 
these authorities or to others to ob- 
tain a final statement of the general 
aim that them. 


Necessary revisions should be made 


is acceptable to 


as indicated by these discussions. 
Usually considerable effort is required 


to avoid defeating the purpose of the 


OUTLINE FOR INTERVIEW TO 
ESTABLISH THE GENERAI 
AIM FOR AN ACTIVITY 


Tatroductory statement: We are mak- 
ing a study of (specify activity). We 
believe you are especially well quali- 
fied to tell us about 
tivity). 

Request for general aim: What would 
you say is the primary purpose of 


(specify ac 


(specify activity)? 
Request for 
words, how would you summarize 


summary: In a_ few 


the general aim of (specify activ- 


itv)? 


Fic. 1. Samepce Form ror Use in OBTAIN- 


ING GENERAL AIM 
general aim by cluttering up the 
statement with specific details and 
qualifying conditions. 

In summary, the general aim of an 
activity should be a brief statement 
obtained from the authorities in the 
field which expresses in simple terms 
those objec tives to which most people 
would agree. Unless a brief, simple 
statement has been obtained, it will 
be difficult to get agreement among 
the authorities. Also it will be much 
harder to convey a uniform idea to 
the participants. This latter group 
will get an over-all impression and 
this should be as close to the desired 
general aim as possible. 


2. Plans and Specifications 


To focus attention on those aspects 
of behavior which are believed to be 
crucial in formulating a functional 
description of the activity, 


precise 
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instructions must be given to the 
observers. It is necessary that these 
instructions be as specific as possible 
with respect to the standards to be 
used in evaiuation and classification. 
The group to be studied also needs 
to be specified. 

One practical device for obtaining 
specific data is to obtain records of 
“critical incidents” observed by the 
reporting personnel. Such incidents 
are defined as extreme behavior, 
either outstandingly effective or in- 
effective with respect to attaining the 
general aims of the activity. The 
procedure has considerable efficiency 
because of the use of only the eCX- 
tremes of behavior. It is well known 
that extreme incidents can be more 
accurately identified than behavior 
which is more nearly average in 
character. 

One of the primary aims of scien- 
tific techniques is to insure objectiv- 
ity for the observations being made 
Such agreement by 


and reported. 
independent observers can only be 
attained if they are all following the 


It is essential that 
these rules be clear and specific. In 
most situations the following speci- 
fications will need to be established 
and made explicit prior to collecting 
the data: 


same set of rules. 


a. The situations observed. The first neces- 
sary specification is a delimitation of the situa- 
This specification must 
include information about the place, the per- 
sons, the conditions, and the activities. Such 
specifications are rather easily defined in many 
instances. For example, such brief specifica- 
tions as observations of ‘‘the behavior in class- 
rooms of regularly employed teachers in a 
specified high school while instructing students 


tions to be observed. 


during class periods,"’ constitute a fairly ade- 
quate definition of a situation of this type. 

In compl X situations it is probably essen- 
tial not only that the specifications with re- 
spect to the situation be relatively complete 
and specific, but also that practical examples 
be provided to assist the observer in deciding 
in an objective fashion whether or not a spe- 


JOHN C. FLANAGAN 


cific behavior should be observed and re- 
corded. 

b. Relevance to the general aim. After the de- 
cision has been made that a particular situa- 
tion is an appropriate one for making observa- 
tions, the next step is to decide whether or not 
a specific behavior which is observed is rele- 
vant to the general aim of the activity as de- 
fined in the section above. For example, if the 
general aim of the activity was defined as sus- 
tained high quality and quantity of produc- 
tion, it might be difficult to decide whether or 
not to include an action such as encouraging 
an unusually effective subordinate to get train- 
ing that would assist him in developing his 
ability in an avocational or recreational! activ- 
ity not related to his work. In this case, it 
might be specified that any action which either 
directly or indirectly could be expected over a 
long period of time to have a significant effect 
on the general aim should be included. If it 
could not be predicted with some confidence 
whether this effect would be good or bad, it 
should probably not be considered. 

The extent of detail required to obtain ob- 
jectivity with respect to this type of decision 
depends to a considerable degree on the back- 
ground and experiences of the observers with 
respect to this activity. For example, super- 
visors with substantial experience in a particu- 
lar company can be expected to agree on 
whether or not a particular behavior is rele- 
vant to the attainment of the general aim. 
On the other hand, if outside observers were 
to be used, it would probably be necessary to 
specify in considerable detail the activities 
that can be expected to have an effect on the 
general aim. 

c. Extent of effect on the general aim. The re- 
maining decision that the observer must make 
is how important an effect the observed inci- 
dent has on the general aim. It is necessary to 
specify two points on the scale of importance: 
(a) a level of positive contributions to the gen- 
eral aim in specific terms, preferably including 
a concrete example, and (4) the corresponding 
level of negative effect on the general aim ex- 
pressed in similar terms. 

A definition which has been found useful is 
that an incident is critical if it makes a ‘‘sig- 
nificant’’ contribution, either positively or 
negatively, to the general aim of the activity. 
The definition of ‘significant’ will depend on 
the nature of the activity. If the general aim 
of the activity is in terms of production, a sig- 
nificant contribution might be which 
caused, or might have caused, an appreciable 
change in the daily production of the depart- 
ment either in the form of an increase or a de- 
crease. 


one 
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In certain specific situations, it might be de- 
sirable and possible to set up a quantitative 
criterion such as saving or wasting 15 minutes 
In some 
situations, a definition of significance might be 
set up in terms of dollars saved or lost both di- 
rectly and indirectly. 

Actions which influence the attitudes of oth- 
ers are more difficult to evaluate objectively. 
Perhaps the best we might be able to do is to 
state it in terms of a probability estimate. For 
example, one such criterion might be that the 
minimum critical level would be an action that 
would have an influence such that at least one 
person in ten might change his vote on an issue 
of importance to the company. 

d. Persons to make the observations. One ad- 
ditional! set of specifications refers to the selec 
tion and training of the observers who are to 
make and report the judgments outlined in the 
steps above. 

Wherever possible, the observers should be 
selected on the basis of their familiarity with 
the activity. Special consideration should be 
given to observers who have made numerous 
observations on persons engaged in the activ- 
ity. Thus, for most jobs, by far the best ob- 
servers are supervisors whose responsibility it 
is to see that the particular job being studied is 
done. However, in some cases very useful ob- 
servations can be contributed by consumers of 
the products and services of the activity. For 
example, for a study of effective sales activi- 
ties, the customers may have valuable data to 
contribute. For a study of effective parental 
activity, the children may be able to make val- 
uable contributions. 

In addition to careful selection of the per- 
sons to make observations, attention should 
be given to their training. Minimal training 
should include a review of the nature of the 
general aim of the activity and a study of the 
specifications and definitions for the judg- 
ments they will be required to make. Where 
the situation is compjex or the observers are 
not thoroughly familiar with the activity, su- 
pervised practice in applying these definitions 
should be provided 


of an average worker's production. 


This can be done by pre- 
paring descriptions of observations and asking 
the observers to make judgments about these 
materials. Their judgments can be immedi 
ately confirmed or corrected during such su- 
pervised practice periods. 

In Fig. 2 is shown a form for use in develop- 
ing specifications regarding observations. The 
use of this form in making plans for the collec- 
tion of critical incidents or other types of ob- 
servational data should aid in objectifying 
these specifications. 
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3. Collecting the Data 


If preper plans and specifcaticns 
are devel ped, the data cellection 
phase is greatly simplified. A neces- 
sary conditicn for this phase is that 
the behaviors or results observed be 
evaluated, classified, and recorded 
while the facts are still fresh in the 
mind of the observer. It would be 


desirable for these operations to be 


Specifications Regarding Observations 
1. Persons to make the observations. 
a. Knowledge concerning the activ- 
ity. 
6. Relation to those observed. 
c. Training requirements. 
. Groups to be observed. 
a. General description. 
b. Location. 
c. Persons. 
d. Times. 
e. Conditions. 
Behaviors to be observed. 
. General type of activity. 
Specific behaviors. 
Criteria of relevance to general 
aim. 
. Criteria of importance to general 
aim (critical points) 


J 


Fic. 2. FORM FOR DEVELOPING SPECIFICA- 
riIONS REGARDING OBSERVATIONS 


performed at the time of observation 
so that all requisite facts could be 
determined and checked. Memory is 
improved if it is known in advance 
that the behavior to be observed is to 
be remembered. It is greatly im- 
proved if the specific aspects of what 
is to be observed are defined and if 
the operations to be performed with 
respect to evaluation and classifica- 
tion are clearly specified. 

The critical incident technique is 
frequently used to collect data on 
observations previously made which 
are reported from memory. This is 
usually satisfactory when the in- 
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cidents reported are fairly recent and 
the observers were motivated to 
make detailed observations and eval- 
uations at the time the incident 
occurred, 

The importance of obtaining re- 
cent incidents to insure that the in- 
cidents are representative of actual 
happenings was demonstrated in the 
study on air route traffic control- 
lers by Nagay (48) reported above. 
However, as also discussed in that 
study, in some situations adequate 
coverage cannot be obtained if only 
very recent incidents are included. 

Evidence regarding the accuracy 
of reporting is usually contained in 
the incidents themselves. If full and 
precise details are given, it can 
usually be assumed that this informa- 
tion is accurate. Vague reports sug- 
gest that the incident is not well re- 
membered and that some of the data 
may be incorrect. In several situa- 
tions there has been an opportunity 


to compare the types of incidents 
reported under two conditions (a) 


from memory and without a list of 
the types of incidents anticipated, 
and (6) those reported when daily 
observations were being made in a 
routine work situation, and the eval- 
uations and classifications were made 
and recorded on a prepared check list 
within 24 hours of the time of obser- 
vation. The results of one such com- 
parison were discussed briefly above 
in connection with the American 
Institute for Research study of fac- 
tory employees. 

During the observational period a 
negligible number of incidents were 
reported by the foremen as not fitting 
into the general headings included on 
the list. Although the proportions of 
incidents for the various items on the 
list are not identical, they are reason- 
ably close for most of the items. 
Items on such matters as meeting 
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production requirements and accept- 
ing changes in jobs are higher in 
terms of the recorded than the re- 
called incidents. The fact that items 
such as wasting time and assisting on 
problems are lower for the recalled 
incidents suggests that part of this 
discrepancy lies in the interpretations 
of the category definitions. The 
classifying of recorded incidents was 
done by the foremen, while the classi- 
fication of the recalled incidents was 
done by the research workers. In 
fairness, it should also be noted that 
the definitions used by the research 
workers were rewritten before they 
were incorporated in the foremen’s 
manuals. 

On the whole, it seems reasonable 
to assume that, if suitable precau- 
tions are taken, recalled incidents can 
be relied on to provide adequate data 
for a fairly satisfactory first approxi- 
mation to a statement of the require- 
ments of the activity. Direct obser- 
vations are to be preferred, but the 
efficiency, immediacy, and minimum 
demands on cooperating personnel 
which are achieved by using recalled 
incident data frequently make their 
use the more practical procedure. 

Another practical problem in col- 
lecting the data for describing an 
activity refers to the problem of how 
it should be obtained from the ob- 
servers. This applies especially to 
the problem of collecting recalled 
data in the form of critical incidents. 
Four procedures have been used and 
will be discussed briefly below: 


a. Interviews. The use of trained personnel 
to explain to observers precisely what data are 
desired and to record the incidents, making 
sure that all necessary details are supplied, is 
probably the most satisfactory data collection 
procedure. This type of interview is somewhat 
different from other sorts of interviews and a 
brief summary of the principal factors involved 
will be given. 

(i) Sponsorship of the study. If a stranger to 





THE CRITICAL INCIDENT TECHNIQUE 


the observers is collecting the data, it is ordi- 
narily desirable to indicate on what authority 
the interview is being held. This part should 
be as brief as possible to avoid any use of time 
for a prolonged discussion of a topic irrelevant 
to the purpose of the interview. In many in- 
stances all that needs to be said is that some- 


one known and respected by the observer has 


suggested the interview. 

(ii) Purpose of the study. This should also 
be brief and ordinarily would merely involve a 
statement that a study was being made to de- 
scribe the requirements of the activity. This 
would usually be cast in some such informal 
form as, ““‘We wish to find out what makes a 
good citizen,”’ or, ‘We are trying to learn in 
detail just what successful work as a nurse in- 
cludes."’ In cases where there is some hesita- 
tion about cooperating or a little more expla- 
nation seems desirable, a statement can be 
added concerning the value and probable uses 
of the results. This frequently takes the form 
of improving selection and training proce- 
dures. In some instances, it would involve im- 
proving the results of the activity. For ex- 
ample, the interviewer might say, “In order to 
get better sales clerks we need to know just 
what they do that makes them especially ef- 
fective or ineffective,’ or, “If parents are to 
be more effective, we need to be able to tell 
them the things they do that are effective and 
ineffective.” 

(iii) The group being interviewed. Uf there is 
any likelihood of a person feeling, “But, why 
ask me?" it is desirable to forestall this by 
pointing out that he is a member of a group 
which is in an unusually good position to ob- 
The special 
qualifications of members of this group as ob 
servers can be mentioned briefly, as, ‘“Super- 
visors such as yourself are constantly observ- 
ing and evaluating the work of switchboard 
girls,’’ or, “Students are in an unusually good 
position to observe the effectiveness of their 


serve and report on this activity. 


teachers in a number of ways 

(iv) The anonymity of the data. Especially 
for the collection of information about inef- 
fective behavior, one of the principal prob- 
lems is to convince the observer that his report 
cannot harm the person reported on in any 
way. 
that the person reported on will never know 
that he has reported the incident. Assurances 
are not nearly so effective in this situation as 
actual descriptions of techniques to be used in 
handling the data, which enable the observer 
to judge for himself how well the anonymity of 
the data will be guarded. Under no circum- 
stances should the confidences of the reportees 
be violated in any way. The use of sealed en 


Usually he also needs to be convinced 
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velopes, avoidance of identifying information, 
the mailing of data immediately to a distant 
point for analysis, and similar techniques are 
helpful in establishing the good faith of the in- 
terviewer in taking all possible precautions to 
safeguard the incidents reported. 

(v) The question. The most crucial aspect 
of the data collection procedure is the ques- 
tions asked the observers. Many studies have 
shown that a slight change in wording may 
produce a substantial change in the incidents 
reported. For example, in one study the last 
part of one of the specific questions asked was, 
**Tell just how this employee behaved which 
caused a noticeable decrease in production.” 
This question resulted in almost all incidents 
reported having to do with personality and at- 
titude behaviors. This part of the question was 
changed to, ‘Tell just what this employee did 
which caused a noticeable decrease in produc- 
tion.”’ This second question produced a much 
broader range of incidents. To the question 
writer “how he behaved" and ‘‘what he did” 
seemed like about the same thing. To the fore- 
men who were reporting incidents ‘how he be- 
haved"’ sounded as if personality and atti 
tudes were being studied. The subtle biases 
involved in the wording of questions are not 
always so easily found. Questions should al- 
ways be tried out with a small group of typical 
observers before being put into general use in 
a study. 

The question should usually refer briefly to 
the general aim of the activity. This aim might 
be discussed more fully in a preliminary sen- 
tence. It should usually state that an incident, 
actual behavior, or what the person did is de- 
sired. It should briefly specify the type of be- 
havior which is relevant and the level of im 
portance which it must reach to be reported. 
It should also tie down the selection of the in- 
cidents to be reported by the observer in some 
way, such as asking for the most recent obser- 
vation, in order to prevent the giving of only 
the more dramatic or vivid incidents, or some 
other selected group, such as those which fit 
the observer's stereotypes 

An effective procedure for insuring that the 


interpretation of the persons being interviewed 


is close to that intended is to request a sample 
of persons typical of those to be interviewed to 
state in their own words what they understand 
they have been asked to do. These persons 
should be selected so as to represent all types 
who will be interviewed. From a study of their 
interpretations, can be 
made to insure that all interviewees will be in 
agreement as to the nature of the incidents 
they are to provide. 

(vi) The 


necessary revisions 


conversation The interviewer 
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should avoid asking leading questions after 
the main question has been stated. His re- 
marks should be neutral and permissive and 
should show that he accepts the observer as 
the expert. By indicating that he understands 
what is being said and permitting the observer 
to do most of the talking, the interviewer can 
usually get unbiased incidents. If the question 
does not seem to be understood, it can be re- 
peated with some reference to clarifying just 
what is meant by it. If the observer has given 
what seems like only part of the story, he 
should be encouraged by restating the essence 
of his remarks. This usually tends to encour- 
age him to continue and may result in his 
bringing out many relevant details that the in- 
terviewer did not know the well 
enough to ask for. In some cases, it is desira- 
ble to have the interviews recorded elec- 
trically and transcribed. This increases the 
work load substantially, and trained inter- 
viewers can usually get satisfactory reports at 
the time or by editing their notes shortly after 
the interview. 

Usually the interviewer should apply cer- 
tain criteria to the incidents while they are be- 
ing collected. Some of the more important cri- 
teria are: (a) is the actual behavior reported; 
(b) was it observed by the reporter; (c) were all 
relevant factors in the situation given; (d) has 


situation 


the observer made a definite judgment regard- 
has the 
observer made it clear just why he believes the 
behavior was critical. 


ing the criticalness of the behavior; (¢ 


In Fig. 3 is shown a sample of the type of 
form used by interviewers to collect critical in- 
cidents. Of course the form must be adapted 
to the needs of the specific situation. 

b. Group interviews. 
time and personnel of the individual interview, 
a group interview technique has been devel 
lhis retains the advantages of the indi- 
vidual interview in regard to the personal con- 
tact, explanation, and availability of the inter- 
viewer to answer questions. 


Because of the cost in 


oped. 


lo some extent it 
also provides for a check on the data supplied 
by the interviewees. Its other advantages are 
that the language of the actual observer is pre- 
cisely reproduced and the time for editing the 
interviews is virtually eliminated. 

The method consists of having the inter- 
viewer give his introductory remarks to a 
group very much as he would do in an individ- 
ual interview. There is an opportunity for 
questions and clarification. Then each person 
is asked to write incidents in answer to specific 
questions contained on a specially prepared 
form. The size of the group which can be han- 
dled effectively will vary with the situation. 
If the group is fairly small, it is usually possi- 
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“Think of the last time you saw one 
of your subordinates do something that 
was very helpful to your group in 
meeting their production schedule.” 
(Pause till he indicates he has such an 
incident in mind.) ‘Did his action re- 
sult in increase in production of as much 
as one per cent for that day?—or some 
similar period ?”’ 

(If the answer is ‘‘no,”’ say) “I won 
der if you could think of the last time 
that someone did something that did 
have this much of an effect in increasing 
production.” (When he indicates he has 
such a situation in mind, say) ‘What 
were the general circumstances leading 


up to this incident?” 


“Tell me exactly what this person did 
that was so helpful at that time.” 


“Why was this so helpful in getting 


your group's job done?” 


“When did this incident happen?” 


“What was this person's job?” 


“How long has he been on this job?" 


“How old is he?” 


Fic. 3. SAMPLE OF 
INTERVIEWER IN 
CRITICAL INCIDENTS 


4 FORM FOR USE BY AN 
COLLECTING EFFECTIVE 


ble for the interviewer to read the responses of 
each member of the group to the first question 
and make sure that he understands what is 
wanted. There seems to be a certain amount 
of social facilitation, and the results in most 
situations have been excellent. In the report 
of the first use of this procedure by Wagner 
(65), the amount of interviewer time required 
per usable incident was 4.3 minutes for the 
group interview procedure as compared with 
15.7 minutes for individual interviews. The 





THE CRITICAL INCIDENT TECHNIQUE 


quality of these incidents, obtained from offi- 
cers in the United States Air Force, appeared 
to be about the same for the two situations. 

c. Questionnaires. If the group becomes 
large, the group interview procedure is more in 
the nature of a questionnaire procedure. 
There are, of course, all types of combinations 
of procedures that can be used. The one that 
is most different from those discussed is the 
mailed questionnaire. In situations where the 
observers are motivated to read the instruc- 
tions carefully and answer conscientiously, 
this technique seems to give results which are 
not essentially different from those obtained 
by the interview method. Except for the ad- 
dition of introductory remarks, the forms used 
in collecting critical incidents by means of 
mailed questionnaires are about the same as 
those used in group interviews. 

d. Record forms. One other procedure for 
collecting data is by means of written records. 
There are two varieties of recording: one is to 
record details of incidents as they happen. 
This situation is very similar to that described 
in connection with obtaining incidents by in- 
terviews above, except that the observation 
and giving of incidents are delayed following 
the introductory remarks and the presentation 
of the questions until an incident is observed 
to happen. 

A variation of this procedure is to record 
such incidents on forms which describe most of 
the possible types of incidents by placing a 
check or tally in the appropriate place. 

As additional information becomes availa 
ble on the nature of the components which 
make up activities, observers may thus collect 
data more efficiently by using forms for record 
ing and classifying observations. In the mean- 
time, because of the inadequacy of the infor- 
mation currently available regarding these 
components, it seems desirable to ask observ- 
ers to report their observations in greater de- 
tail and have the classification done by spe- 
cially trained personnel. 

Size of sample. A general problem which 
overlaps the phases of collecting the incidents 
and analyzing the data relates to the number 
of incidents required. There does not appear 
to be a simple answer to this question. If the 
activity or job being defined is relatively sim- 
ple, it may be satisfactory to collect only 50 or 
100 incidents. On the other hand, some types 
of complex activity appear to require several 
thousand incidents for an adequate statement 
of requirements. 

The most useful procedure for determining 
whether or not additional incidents are needed 
is to keep a running count on the number of 
new critical behaviors added to the classifica 
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tion system with each additional 100 inci- 
dents. For most purposes, it can be considered 
that adequate coverage has been achieved 
when the addition of 100 critical incidents to 
the sample adds only two or three critical be- 
haviors. For jobs of a supervisory nature, it 
appears that between 2,000 and 4,000 critical 
incidents are required to establish a compre- 
hensive statement of requirements that in- 
cludes nearly all of the different types of criti- 
cal behaviors. For semiskilled and skilled jobs 
between 1,000 and 2,000 incidents seem to be 
adequate to cover the critical behaviors. 

Coverage of all or nearly all of the various 
critical behaviors is not the only criterion as to 
whether or not a sufficient number of critical 
incidents has been collected. If a relatively 
precise definition of each critical behavior cat- 
egory is required, it may be necessary to get at 
least three or four examples of each critical be- 
havior. Simiiarly, if the critical incidents are 
to be used as a basis for developing selection 
tests, training and proficiency 
measures, more incidents may be required to 
provide a sufficient supply of usable ideas for 
the development of these materials. 

In summary, although there is no simple 
formula for determining the number of critical 
incidents that will be required, this is a very 
important consideration in the plan of the 
study; checks should be made both on the first 
hundred or so incidents and again after ap- 


materials, 


proximately half of the number of incidents 
believed to be required have been obtained in 
order to make it possible to revise the prelimi 
nary estimates, if necessary, with a minimum 
loss in effort and time. 


4. Analyzing the Data 


The collec tion of a large sample of 
incidents that fulfill the various con- 
ditions outlined above provides a 
functional description of the activity 
If the 
sample is representative, the judges 
well qualified, the types of judgments 
appropriate and well defined, and the 
procedures for observing and report- 
ing such that incidents are reported 
accurately, the stated requirements 
can be expected to be comprehensive, 
detailed, and valid in this form. 
There is only one reason for going 
further and that is practical utility. 
The purpose of the data analysis 


in terms of specific behaviors. 
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stage is to summarize and describe 
the data in an efficient manner so 
that it can be effectively used for 
many practical purposes. 

In the discussion which follows, it 
should be kept in mind that the 
process of des ription has been com- 
pleted. The specific procedures to be 
discussed are not concerned with im- 
proving on the comprehensiveness, 
specificity of detail, or validity of the 
statement of the requirements of the 
activity. 
with making it easier to report these 
draw inferences 
from them, and to compare the ac- 


Rather, they are concerned 
requirements, to 


tivity with other activities. 

The aim is to increase the useful- 
ness of the data while sacrificing as 
their 
hensiveness, specificity, and validity. 


little as possible of compre- 
It appears that there are three pri- 
mary problems involved: (a) the 
the general frame of 
reference that will be most useful for 


selection of 


describing the incidents; (6) the in- 


development ol a set ol 
and headings; 
and (c) the selection of one or more 


ductive 
major area subarea 
levels along the specificity-generality 
continuum to use in reporting the 
requirements. Each of these prob- 
lems will be discussed below: 

There are countless 
ways in which a given set of incidents can be 


a Frame of reference. 


In selecting the general nature of 
the classification, the principal consideration 


‘ lassified 


should usually be that of the uses to be made 
of the data. The preferred categories will be 
those believed to be most valuable in using the 
Other considera- 
tions are ease and accuracy of classifying the 


statement of requirements 


data, relation to previously developed defini 
tions or classification systems, and considera- 
tions of interpretation and reporting, which 
will be discussed in a later section. 

For job activities, the choice of a frame of 
reference is usually dominated by considera- 
tions of whether the principal use of the re- 
quirements will be in relation to selection, 


training, measurement of proficiency, or the 
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development of procedures for evaluating on- 
the-job effectiveness. For selection purposes, 
the most appropriate classification system is a 
psychological one. The main headings have to 
do with types of psychological traits that are 
utilized in the selection process. For training 
uses, the best classification system follows a 
set of headings that is easily related to training 
courses or broad training aims. For pro- 
ficiency measurement, the headings tend to be 
similar to those for training except that there 
is less attention to possible course organization 
and aims and greater attention to the com- 
ponents of the job as it is actually performed. 
For the development of procedures for evalu- 
ating on-the-job effectiveness to establish a 
criterion of success, the classification system is 
necessarily directed at presenting the on-the- 
job behaviors under headings that represent 
either well-marked phases of the job or pro 
vide a simple framework for classifying on-the- 
job activities that is either familiar to or easily 
learned by supervisors. 

Similarly, in nonvocational activities the 
frame of reference depends on the uses planned 
for the findings. For example, if a study is be- 
ing made to define immaturity reactions in 
military personnel, the frame of reference 
would depend somewhat on whether the func- 
tional description is to be used primarily to 
identify personnel showing this type of malad- 
justment or whether the principal use will be 
to try to prepare specifications for types of 
situations in which immaturity 
would not lead to serious difficulties 

b. Category formulation. The induction of 
categories from the basic data in the form of 
incidents is a task requiring insight, experi- 
ence, and judgment. Unfortunately, this pro- 
cedure is, in the present stage of psychological 
knowledge, more subjective than objective. 
No simple rules are available, and the quality 
and usability of the final product are largely 
dependent on the skill and sophistication of 
the formulator. One rule is to submit the ten- 
tative categories to others for review. Al- 
though there is no guarantee that results 
agreed on by several workers will be more use- 
ful than those obtained from a single worker, 
the confirmation of judgments by a number of 
persons is usually reassuring. The usual pro- 
cedure is to sort a relatively small sample of 
incidents into piles that are related to the 
frame of reference selected. After these tenta- 
tive categories have been established, brief 
definitions of them are made, and additional 
incidents are classified into them. During this 
process, needs for redefinition and for the de- 
velopment of new categories are noted. The 


reactions 
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tentative categories are modified as indicated 
and the process continued until all the inci- 
dents have been classified. 

The larger categories are subdivided into 
smaller groups and the incidents that describe 
very nearly the same type of behavior are 
placed together. The definitions for all the 
categories and major headings should then be 
re-examined in terms of the actual incidents 
classified under each. 

c. General behaviors. The last step is to de- 
termine the most appropriate level of specific- 
ity-generality to use in reporting the data. 
This is the problem of weighing the advan- 
tages of the specificity achieved in specific inci- 
dents against the simplicity of a relatively 
small number of headings. The level chosen 
might be oily a dozen very general behaviors 
or it might be several hundred rather specific 
behaviors. Practical considerations in the im- 
mediate situation usually determine the opti- 
mal level of generality to be used. 

Several considerations should be kept in 
mind in establishing headings for major areas 
and in stating critical requirements at the se- 
lected level of generality. These are listed be 
low: 

(i) The headings and requirements should 
indicate a clear-cut and logical organization. 
hey should have a discernible and easily re 
membered structure. 

(ii) The titles should convey meanings in 
themselves without the necessity of detailed 
definition, explanation, or differentiation. 
This does not mean that they should not be 
defined and explained. It does mean that these 
titles, without the detailed explanation, 
should still be meaningful to the reader. 

(iii) The list of statements should be ho- 
mogeneous; i.e., the headings for either areas 
or requirements should be parallel in content 
and structure. Headings for major areas 
should be neutral, not defining either unsatis 
factory or outstanding behaviors. Critical re- 
quirements should ordinarily be stated in posi- 
tive terms. 

(iv) The headings of a given type should all 
be of the same general magnitude or level of 
importance. Known biases in the data causing 
one area or one requirement to have a dispro- 
portionate number of incidents should not be 
reflected in the headings 

(v) The headings used for classification and 
reporting of the data should be such that find- 
ings in terms of them will be easily applied and 
maximally useful. 

(vi) The list of headings should be compre 
hensive and cover all incidents having signifi- 
cant frequencies. 


5. Interpreting and Reporting 

It is never possible in practice to 
obtain an ideal solution for each of 
the practical problems involved in 
obtaining a functional description of 
an activity. Therefore, the statement 
of requirements as obtained needs 
interpretation if it is to be used 
properly. In the real 
errors are made not in the collection 
and analysis of the data but in the 
failure to interpret them properly. 
Each of the four preceding steps, (a) 
the determination of the general aim, 
(b) the specification of observers, 


many cases, 


groups to be observed, and observa- 
tions to be made, (c) the data collec- 
tion, and (d) the data analysis, must 
be studied to see what biases have 
been introduced by the procedures 
adopted. If there is a division of 
opinion as to the general aim and one 
of the competing aims is selected, 
this should be made very clear in the 
report. If the groups on whom the 
observations are made are not repre- 
sentative of the relevant groups in- 
volved, they must be described as 
precisely as possible. The aim of the 
study is usually not a functional de- 
scription of the activity as carried on 
by this sample but rather a state- 
ment relating to all groups of this 
type. In order to avoid faulty in- 
ferences and generalizations, the limi- 
tations imposed by the group must 
be brought into clear focus. Simi- 
larly, the nature of judgments made 
in collecting and analyzing the data 
must be carefully reviewed. 

While the limitations need to be 
clearly reported, the value of the 
results should also be emphasized. 
Too often the research worker shirks 
his responsibility for rendering a 
judgment concerning the degree of 
credibility which should be attached 
to his findings. It is a difficult task, 
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but 


someone 


if the results are to be used, 
will to make such a 
judgment, and investi- 
gator is best prepared to make the 


have 
the original 
necessary evaluations either for the 
gene ral case or for certain typical 
specific examples. 


CRITICAL INCIDENT 
TECHNIQUE 


Uses OF THI 


The variety of situations in which 
the collection of critical incidents will 
prove of value has only been par- 
tially explored. In the approximately 


eight years since the writer and his 
for- 
mulation of principles and procedures 
to be followed in collee ting this type 
of data, a fairly large number of ap- 


colleagues began a systemati 


plications has been made. The appli- 
under the 
Meas- 
ures of typical performance (criteria) ; 
(b) measures of proficiency (standard 


cations will be discussed 
following nine headings: (a) 


samples); (c) training; (d) selection 
and classification; (e) job design and 
purification; ({) operating procedures; 
(g) equipment design; (4) motivation 
and leadership (attitudes); (2) coun- 
seling and psychotherapy. 

Space is not available here to de- 
scribe these various applications in 
detail. However, a brief description 
of the types of application that have 
been made, along with brief illustra- 
tive examples and references, will be 
presented. Some of the studies in- 
volve several of the types of applica- 
tions to be discussed. The presenta- 
tion is not intended to be complete, 
but rather to give the reader inter- 
ested in further study some orienta- 
tion and guidance. 

Measures of typical performance 
(criteria). The simplest and most 
natural application of a systematic- 
ally collected set of critical incidents 
is in terms of the preparation of a 


statement of critical requirements 


JOHN C. FLANAGAN 


and a check list or some similar type 
of procedure for evaluating the typi- 
cal performance of persons engaged 
in this activity. If an observational 
check list that includes all of the im- 
portant behaviors for the activity is 
available, the performance of the in- 
dividual can be objectively evaluated 
and recorded by merely making a 
single tally mark for each observa- 
tion. Such records provide the essen- 
tial basis for criterion data which are 
sufficiently detailed and specific for 
special purposes but at the same time 
can be combined into a single over-all 
evaluation when this is desirable. 
Such a procedure was first suggested 
and tried out in connection with de- 
velopmental studies of the American 
Institute for Research. These in- 
cluded: Preston's study of officers for 
the United States Air Force (52); 
Nagay’s study on air route traffic 
controllers for the Civil Aeronautics 
Administration (49); and M. H. 
Weislogel’s study on research per- 
sonnel for the Office of Naval Re- 
search (69). Another American In- 
stitute for Research study was re- 
ported by R. B. Miller and the 
present author (21). This was a per- 
formance record form for hourly 
wage employees developed in coopera- 
tion with personnel of the Delco- 
Remy Division of the General Motors 
Corporation, the Employment Prac- 
tices Division of that corporation, 
and the Industrial Relations Center 
of the University of Chicago. The 
same authors have developed similar 
performance records for salaried em- 
ployees, and foremen and supervisors 
(22, 23). The principles and pro- 
cedures underlying this type of evalu- 
ation of performance have been pub- 
lished elsewhere (14, 15, 17). 

A number of important contribu- 
tions to the development of func- 
tional descriptions and standards of 
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performance have been made by 
other groups using the critical inci- 
dent technique. One of the most 
notable of these is the development 
by Hobbs et al. (3, 31), of Ethical 
Standards of Psychologists. More 
than 1,000 critical incidents involving 
ethical problems of psychologists 
were contributed by the members of 
the American Psychological Associa- 
tion. It is believed that this repre- 
sents the first attempt to use empiri- 
cal methods to establish ethical 
standards. Because of the impor- 


tance of this study, and the generality 
of some of the problems involved, 
certain of the conclusions reported by 
the Committee on Ethical Standards 
for Pyschology in their introductory 
statement will be quoted here. 


First, it is clear that psychologists believe 
that ethics are important; over two thousand 
psychologists were sufficiently concerned with 
the ethical obligations of their profession to 
contribute substantially to the formulation of 
these ethical standards. Second, psychologists 
believe that the ethics of a profession cannot be 
prescribed by a committee; ethical standards 
must emerge from the day-by-day value com- 
mitments made by psychologists in the prac- 
tice of their profession. Third, psychologists 
share a conviction that the problems of men, 
even those involving values, can be studied 
objectively; this document summarizes the re- 
sults of an effort to apply some of the tech- 
niques of social science to the study of ethical 
behavior of psychologists. Fourth, psycholo- 
gists are aware that a good code of ethics must 
be more than a description of the current sta- 
tus of ethics in the profession; a code must em- 
body the ethical aspirations of psychologists 
and encourage changes in behavior, bringing 
performance ever closer to aspiration. Fifth, 
psychologists appreciate that process is often 
more important than product in influencing 
human behavior; the four years of widely- 
shared work in developing thiscode are counted 
on to be more influential in changing ethical 
practices of psychologists than will be the pub- 
lication of this product of their work. Finally, 
psychologists recognize that the process of 
studying ethical standards must be a continu- 
ing one; occasional publications such as this 
statement mark no point of conclusion in the 
ongoing process of defining ethical standards 
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—they are a means of sharing the more essen- 
tial discipline of examining professional expe- 
rience, forming hypotheses about professional 
conduct, and testing these hypotheses by ref- 
erence to the welfare of the people affected by 
them (3, p. v). 


In addition to the study by Smit 
mentioned in a previous section (58), 
several other studies on the use of the 
critical incident procedures as a 
basis for evaluating teaching effec- 
tiveness have been reported. One of 
these was a study conducted under 
the joint sponsorship of the Educa- 
tional Research Corporation and the 
Harvard University Graduate School 
of Education with funds provided by 
the New England School Develop- 
ment Council and the George F. 
Milton Fund. This was an explora- 
tory study of teacher competence 
reported by Domas (6). Approxi- 
mately 1,000 critical incidents were 
collected from teachers, principals, 
and other supervisors. Although this 
was an exploratory study, it was felt 
that it made an important contribu- 
tion to the general problem of relat- 
ing salary to teacher competence. 

The second of these studies was 
conducted as part of the teacher 
characteristics study sponsored by 
the American Council on Education 
and subsidized by the Grant Founda- 
tion. This study is reported by 
Jensen (32). Teachers, administra- 
tors, and teachers in training in the 
Los Angeles area contributed more 
than 1,500 critical incidents of teacher 
behavior. The incidents were classi- 
fied under personal, professional, and 
social qualities. The category formu- 
lation indicated that there were about 
20 distinct critical requirements. 
These were recommended as a basis 
for teacher evaluation and as an aid 
to the in-service growth of teachers. 

Another study was that of Smith 
and Staudohar (59), which deter- 
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mined the critical requirements for 
basic training of tactical instructors 
in the United States Air Force. From 
130 training supervisors, 555 tactical 
instructors, and 3,082 basic trainees, 
a total of 6,615 usable incidents were 
obtained. The 
that: 


authors comment 


The training supervisors report a predomi 
nance of ineffective incidents in the major 
areas of: Sets a good example and maintains 
effective personal relations. The tactical in- 
structors report more ineffective incidents in 
the area of Makes his expectations clear. Ba 
sic trainees show a predominance of ineffec 
tive incidents in three areas: Sets a good ex- 
ample, Considers trainee’s needs, and Main- 
tains effective personal relations (59, p. 5). 


Another study on the evaluation of 
instructor effectiveness was carried 
out by Konigsburg (33). This study 
involved the development of an in- 
structor check list for college in- 
structors the critical in- 
cident technique and a comparison 


based on 


of techniques for recording observa- 
tions. Its principal findings were the 
very low correlation coefficients be- 
tween the total from the 
Purdue Rating Scale for Instruction 
and the instructor check list. When 
these two instruments each 
given to half the class on the same 
day, the average correlation coeff- 
cient was found to be .29. The other 
principal finding is that the planned 
performances of a total of 46 pre- 
determined were better 
reflected by the results obtained on 
the instructor check list than by the 
results on the Purdue Rating Scale. 

A somewhat related study has been 
reported by Barnhart (4). This 
study collected a large number of 


scores 


were 


behaviors 


critical incidents for the purpose of 
establishing the critical requirements 


for school board membership. The 
author applied his findings to the 
problem of evaluating the effective- 
ness of school board members. 
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Another type of application of the 
critical incident technique to the de- 
velopment of bases for evaluating be- 
havior is the previously mentioned 
study of Eilbert (7). His list of 51 
types of immature reaction based on 
a collection of several hundred criti- 
cal incidents describing manifesta- 
tions of emotional immaturity is be- 
lieved to provide a useful guide to 
further investigation and appraisal of 
persons with behavior problems. It 
is believed that the results of this 
study provide substantial encourage- 
ment to the application of the critical 
incident technique to similar prob- 
lems in the field of clinical diagnosis 
and evaluation. 

Measures of proficiency (standard 
samples). <A closely related 
critical incidents is to provide a basis 
for evaluating the 
persons by use of standard samples of 
behavior involving 
pects of the activity. 


use of 
performance ol 


important as- 
Such evalua- 
tions are called proficiency measures 
and are differentiated from the evalu- 
ation of typical performance on the 
job primarily on the basis that a test 
situation rather than a real job situa- 
tion is used. Measures of this sort are 
especially useful at the end of train- 
ing courses as checks on the main- 
tenance of proficiency, and when the 
tasks assigned to participants vary a 
great deal in difficulty or are not di- 
rectly observed by the supervisors. 
One of the first applications of 
critical incidents to the development 
of proficiency measures was Gordon's 
study on the development of a stand- 
ard flight check for the airline trans- 
port rating (28, 29). This study was 
done by the American Institute for 
Research under the sponsorship of 
the National Research Council Com- 
mittee on Aviation Psychology with 
funds provided by the Civil Aero- 
nautics Administration. In this 
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study data from analyses of airline 
accidents were combined with critical 
incidents reported by airline pilots to 
provide the basis for developing an 
objective measure of pilot profi- 
ciency. The flight check consisted of 
the presentation of situations pro- 
viding uniformly standardized op- 
portunities to perform the critical 
aspects of the airline pilot's job as 
indicated from the study of the acci- 
dents and critical incidents reported. 
The new check was found to yield 88 
per cent agreement on the decision to 
pass or fail a particular pilot when 
examined on flights on 
days by different check pilots. 


successive 
The 
previous flight check when used on 
the same flights gave only 63 per cent 
agreement, which was little better 
than chance under the conditions of 
the study. 

Similar studies on the development 
of flight checks at the American In- 
stitute for Research have been carried 
out by Marley (36, 37), G. S. Miller 
(39), and Ericksen (9). These studies, 
sponsored by the United States Air 
Force and the Civil Aeronautics Ad- 
were concerned — re- 
spectively with flight 
checks for B-29 bombing crew mem- 
bers, B-36 bombing crew members, 
and private pilots flying light civilian 
aircraft. Ericksen also developed a 
light plane proficiency check to pre- 
dict military flying success (10) on 
a similar project sponsored by the 
United States Air Force Human Re- 
sources Research Center. 


ministration, 
objective 


A similar set of proficiency meas- 
ures was developed by Krumm for 
Air Force pilot instructors (34, 35), 
also under the sponsorship of the 
Human Resources Research Center. 
These measures were. based on more 
than 4,000 critical incidents collected 
from student pilots, flight instructors, 
The critical 


and supervisors. inci- 
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dents classified under three 
main headings: (a) proficiency as a 
pilot; (6) proficiency as a teacher; 
and (c) proficiency in maintaining 
effective personnel relations. The 
proficiency measures developed in 
connection with this study included 
paper-and-pencil tests presenting cri- 
tical situations and requiring the 
instructor to select one of several 
proposed solutions. 

Another development of this type 
carried on at the American Institute 
for Research was the construction of 
for evaluating research pro- 
ficiency in physics and chemistry for 
the Office of Naval Research by 
M. H. Weislogel (71). This study 
was based on the critical incidents for 
research personnel (20) discussed in 
a previous section. The items for 
these proficiency measures were 
based on detailed rationales. The 
items described a practical research 
situation in considerable detail and 
outlined five specific choices concern- 
ing such matters as the best thing to 


were 


tests 


do next, suggestions for improving 
the procedure as reported, etc. The 
critical behaviors tested in the items 
were taken directly from the critical 
incidents. The method of developing 
tests through the use of comprehen- 
sive rationales has been 
generally in another paper (16). 
Three studies have been reported 
by the American Institute for Re- 
search in which critical incidents 
were used as a basis for developing 
performance 
measuring certain aspects of the pro- 
ficiency of military personnel. These 
included the study of Sivy and Lange 
on the development of an objective 
form of the Leaders Reaction Test 
for the Personnel Research Branch, 
Department of the Army (57). This 
test included 20 situational problems 
based on the critical requirements of 


discussed 


situational tests for 
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the noncommissioned combat infan- 
try leader as determined on the basis 
of critical incidents collected in mili- 
tary maneuvers and during combat 
operations at the front in Korea. A 
second proficiency measure of a 
somewhat similar sort was developed 
for other types of personnel by R. L. 
Weislogel (73). The third study of 
this type was carried out by Suttell 
(61) for the Human Resou:ces Re- 
search Center. This study was based 
on critical incidents collected in pre- 
vious studies of the American Insti- 
tute for Research and reported the 
development and preliminary evalua- 
tion of the Officer Situations Test. 
This test was designed to measure 
nonintellectual aspects of officer per- 
formance through the use of 16 
situational problems requiring about 
six hours of testing time. 

Because of the great difficulty in 
obtaining valid and reliable measures 
of typical performance, accurate 
measures of proficiency are essential 
for many types of activities. It is 
apparent that a comprehensive set of 
critical can be of great 
value in constructing such measures, 

Training. Many of the applica- 
tions of the critical incident technique 
to training problems have been car- 
ried out for the military in special 
that the reports are 
security information. In 
addition to work by Preston, Glaser, 
and R. L. Weislogel, R. B. Miller and 
Folley utilized critical inci- 
dents in establishing training require- 


incidents 


situations so 
classified 


have 


ments for specific types of main- 
tenance mechanics (47) in a study for 
the Human Resources Research Cen- 
ter. 

Similarly, Ronan has used critical 
incidents as a basis for developing a 
program of training for emergency 
procedures in multi-engine aircraft 
(54) in a study for the United States 


JOHN C. FLANAGAN 


Air Force Human Factors Operations 
Research Laboratory. On the basis 
of several thousand incidents re- 
ported by aircrew personnel regard- 
ing emergencies, three evaluation 
devices were prepared. These in- 
volved a conventional type multiple- 
choice test; a special multiple-choice 
test designed to measure the indi- 
vidual’s information concerning the 
important cues in the emergency 
situation, the appropriate actions to 
be taken, and the basic troubles or 
causes of the emergency; and a 
“flight check”’ to be used in evaluat- 
ing the performance of aircrew mem- 
bers in electronic flight simulators. 

The obvious relevance of the be- 
haviors involved in critical incidents 
and the specific details included make 
such incidents an ideal basis for de- 
veloping training programs and train- 
ing materials. 

A recent study by Collins (5) uses 
critical incidents as a basis for evalu- 
ating the effectiveness of a training 
program. The types of incidents re- 
ported by mothers after a two-week 
training course were significantly dif- 
ferent from those reported at the 
beginning of the program in a number 
of aspects relevant to the objectives 
of the program. The critical inci- 
dents appeared to provide a much 
more sensitive revealing 
changes than other procedures used. 

Selection and classification. Until 
recently, the customary approach of 
the research psychologist to the de- 
velopment of tests for selection and 


basis for 


classification purposes has been as 


follows: A very brief period was given 
to study of the job. Following this, 


a wide variety of selection procedures 
was administered to a group of ap- 
plicants or employees, and follow-up 


data were gathered. Since the re- 
search psychologist had little con- 
fidence in the accuracy of his analysis 
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of the psychological elements _re- 
quired by the job, there was a tend- 
ency to try everything that was 
available and seemed even remotely 
related to the tasks involved. This 
has been called the “shotgun ap- 
It was hoped that with a 
wide scatter at least a few of the tests 
would pay off. The critical incident 
technique has lent substantial sup- 
port to the more thorough study of 
the job prior to initiating testing pro- 
cedures. There is increasing feeling 
at the present time that a much 
larger percentage of the investiga- 
tor’s time should be spent on deter- 
mining the critical requirements of 
the job, so that the psychologist will 
have sufficient confidence in his 
tentative conclusions as to the nature 


proach.” 


of the important selection procedures 
to permit their use on a tentative 
basis prior to the collection of em- 
pirical follow-up data. This is espe- 
cially important in those situations 
where the follow-up requires a very 
long period of time or where the 
number of cases that can be followed 
up is so small that definitive findings 
cannot be anticipated. 

One of the most important re- 
quirements for developing a system 
of job analysis that will facilitate a 
relatively accurate identification of 
the important job elements for a 
specific task is to establish a clear and 


specific set of definitions for these job 
The 
American Institute for Research has 
carried out a series of projects on this 


elements in behavioral terms. 


problem. The first of these was a 
study undertaken by Wagner under 
the sponsorship of the United States 
Air Force School of Aviation Medi- 
cine to define the requirements of air- 
crew jobs in terms of specific job 
elements (67, 68). Several thousand 
critical incidents were gathered from 


aircrew members, and these were 
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classified into 24 job elements. These 
job elements were inductively formu- 
lated from the critical incidents and 
were grouped under the four area 
headings: (a) learning and thinking; 
(6) observation and _ visualization; 
(c) sensory-motor coordination; and 
(d) motives, temperament, and lead- 
ership. 

The development of more than 100 
proficiency tests to measure each of 
the various critical behaviors in- 
cluded in the 24 tentatively proposed 
job elements was reported by Hahn 
(30) for the School of Aviation Medi- 
cine. These tests were administered 
to a group of approximately 500 high 
school senior boys, and the intercor- 
relations were used to reformulate 
the tentative job elements. In a 
study just completed by Taylor (62) 
for the Human Resources Research 
Center, the results of applying an 
analytical procedure developed by 
Horst to study the interrelationships 
involved are reported. This analysis 
led to the formulation of a new set of 
20 job elements for each of which a 
selection test has been developed. 
These tests have been administered 
to several hundred aviation cadets 
and follow-up data on their success in 
aircrew training should be available 
soon. 

A similar project based on critical 
incidents collected from various ci- 
vilian jobs has been reported by the 
present author (2, 18, 19). The 
Flanagan Aptitude Classification 
Test Series, published in 1953, pro- 
vides aptitude measures for 14 criti- 
cal job elements. The Applicant 
Inventory, also published in 1953, 
measures attitudes predictive of job 
adjustment for hourly wage em- 
ployees. 

An effort to adapt the critical in- 
cident technique to the problem of 
developing civil service examinations 
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is reported by Wager and Sharon 
(64). In an exploratory study they 
collected about 100 incidents regard- 
main- 
These incidents 


ing on-the-job behaviors of 
tenance technicians. 
were used as a basis for determining 
terms of 
havior, and test items were developed 


job requirements in be- 


for use in selecting applicants who 
could be exper ted to meet these re- 
quirements. 

that used critical 
incidents as a basis for developing 


Another study 


tests to predict performance was car- 
ried out by O'Donnell (51). His test, 
designed to predict success in den- 
tistry, was based on critical incidents 
collected by Wagner. 
cludes items designed to predict, in 
the 
areas: (a) 


The test in- 
three general 
technical 
proficiency; (6) handling patient re- 


part, following 


demonstrat ing 


lationships; and (¢) accepting pro- 
A follow-up 
study indicated moderate validity for 


fessional responsibilit vy. 


these materials. 
One of the few studies known to 
the author in which the critical inci- 
dent technique was used in a project 
carried on outside the United States 
(8). 
Uni- 
versity of Liége, investigated the ap- 


is Emons’ doctor's dissertation 
This study, carried out at the 


titudes of effective sales personnel in 
a large department store. A group of 
40 supervisors provided 228 critical 
incidents. Nine categories were for- 
mulated from this group of incidents 
recommendations made for an 
test to improve current 
selection procedures. 

5. Job design and purification. In- 
adequate attention has been given to 


and 
aptitude 


the scientific design of jobs to pro- 
mote over-all efficiency. Where a 
team has several different types of 
tasks to perform, it is frequently pos- 
sible to design each of the team mem- 
ber’s jobs so that only a few of the 
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several tasks are involved. If the 
jobs have been studied by use of the 
critical incident technique, it may be 
possible to select and train each team 
member for only two or three of the 
critical job elements. 
maximize the per- 
formance with respect to each of the 
Although 
such procedures have nearly always 
been informally used in planning the 
work of teams, the critical incident 


This tends to 
effectiveness of 


various types of tasks. 


technique facilitates the collection of 
the data essential to this type of job 
purification. 

Some preliminary work on this 
problem has been carried out at the 
American Institute for Research. 
Recommendations from 
these studies for reducing the number 


resulting 


of job elements required in certain 


common maintenance ex- 


pected to lead to a saving of millions 


1¢ »bs are 


of dollars in training costs as well as 
to improving the effectiveness of job 
performance. 

Operating procedures. Another ap- 
plication of critical incidents which 
has not been adequately exploited Is 
the study of operating procedures. 
Detailed factual 
and failures that can be systematic- 


data on 


SuUCCEeSSCS 


ally analyzed are of great importance 
in improving the effectiveness and 
Such infor- 
mation can be efficiently collected by 


efficiency of operations. 


means of the critical incident tech- 
nique. 

Examples of such studies are pro- 
vided by a series of three projects 
carried out by the American Institute 
for Research under the sponsorship 
of the United States Air Force 
School of Aviation Medicine. The 
first of these involves the collection of 
critical incidents relating to near ac- 
cidents in flying reported by Vasilas, 
Fitzpatrick, DuBois, and Youtz (63). 
More than 1,700 critical incidents 
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were collected from pilots and other 
aircrew members by procedures de- 
veloped for this study. These inci- 
dents pointed to possible improve- 
ments in training job design and 
equipment design as well as in operat- 
ing procedures. 

The 


specifically concerned with the effect 


second of these studies was 
of the age of pilots and other crew 
members on aircrew operations. This 
study was reported by Shriver (56), 
and included 
regarding 


tentative suggestions 
various modifications in 
operating procedures. 

The third study in this series, re- 
ported by Goodenough and Suttell 
(26), involved the collection of criti- 
cal incidents regarding the impair- 
ment of human efficiency in emer- 
incidents 
provide a detailed statement of both 


the types of stresses that impair per- 


gency operations. These 


formance and the types of perform- 
ance that are impaired under specific 
More than 2,000 critical 
which 
impairment in performance on opera- 
tional 


conditions. 


incidents were collected = in 


assignments was observed. 


These incidents collected in 
Alaska and the Far East as well as in 
operational commands in the United 
States. This report contains sugges- 
tions for improving operations in 
emergency situations. 
Equipment design. 
closely related to that just discussed 


were 


An application 


involves the collection of critical in- 
the 
Reports of specific in- 
the field have always 
been a basis for equipment modifica- 
tions. The critical incident technique 
facilitates the collection and process- 
ing of this type of information. ‘Too 
often in the past action was taken on 


cidents to improve design of 


equipment. 


cidents from 


the basis of informal reports from 


operating personnel. The collection 


of large numbers of critical incidents 
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representative of operating experi- 
ence provides a sound basis for modi- 
fying existing equipment and design- 
ing new models. 

In the study by Fitts and Jones 
(12), mentioned above, which was 
carried out at the Aero-Medical 
Laboratory, 270 critical incidents 
relating to errors in reading and in- 
terpreting aircraft instruments were 
collected and analyzed. These led to 
a number of specific suggestions re- 
garding modifications in instrument 
displays. 

Other recent studies conducted at 
the American Institute for Research 
have used data from the critical in- 
cident technique along with other 
sources to develop procedures tor 
designing jobs. The reports on these 
projects are classified for military 
security reasons. 
the American 
Institute for Research have used the 
critical incident technique as a sup- 
plemental procedure for task analysis 
of equipment in the design stage of 
development (9, 10, 34, 35, 39). 
These procedures have been found 
when 
chologists working closely with en- 


Other projects at 


very effective used by psy- 
gineers on the preparation of design 
specifications for new equipment. 
Motivation and leadership. The 
study of attitudes has been somewhat 
limited and difficul 


cause of the almost exclusive reliance 


to interpret be- 


on verbal statements of opinions and 
The 


technique has been applied in a few 


preferences. critical incident 
instances to gather factual data re- 
garding specific actions involving de- 
cisions and choices. These studies 
suggest that critical incidents of this 
type may be a very valuable supple- 
mentary tool for the study of atti- 
tudes. 

A recent study carried out by 
Preston of the American Institute for 
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Research for the Air Force’s Human 
Resources Research Center (53) used 
critical incidents as a basis for study- 
ing decisions of airmen to re-enlist in 
the Air Force. It is believed that 
these specific incidents provide valu- 
able information not contained in 
studies utilizing only data on opin- 
ions. 

A series of reports by Ruch (55) 
contains critical incidents on combat 
leadership collected from senior offi- 
cers in the Far East Air Forces. 
These incidents provide a factual 
basis for the study of motivation and 
leadership of Air 
engaged in combat operations. 

Counseling and psychotherapy. An- 
other field in which current tech- 
niques emphasize over-all impres- 
sions, opinions, and reports of single 
cases is counseling and psychother- 
apy. There appears to be a trend, 
however, in this field toward em- 
phasizing the collection of factual 
incidents. This suggests that the 
critical incident technique may be 
useful in this area also. 

Exploratory work has_ recently 
been done at the University of Pitts- 
burgh with the critical incident tech- 
nique to establish areas of change 
accompanying psychotherapy. A 
series of three master’s theses were 
carried out by Speth, Goldfarb, and 
Mellett (25, 38, 60). They collected 
243 critical incidents from 11 psycho- 
therapists. These incidents were col- 
lected about patients who had shown 
improvement and were replies to the 
question, ‘“‘What did the patient do 
that was indicative of improve- 
ment?’ Although these studies were 
primarily exploratory in nature, the 
tentative finding that different thera- 
pists stress different criteria of im- 
provement and nonimprovement sug- 
gests that the critical incident ap- 
proach may be of use not only in de- 


Force personnel 
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veloping objective measures of im- 
provement but also in experimental 
studies of the types of improvement 
resulting from the therapists’ use of 
specific procedures. 

A somewhat related type of study 
initiated by Diederich and reported 
by Allen (1) describes the use of the 
technique to obtain critical incidents 
from students reporting things that 
caused them to like a fellow high 
school student either more or less 
than before. This study is being con- 
tinued to provide the basis for tests 
of specific value areas. An incidental 
finding of the study was that when 
an example of the kind of incident 
desired was shown on the form, 53 
per cent of the positive and 23 per 
cent of the negative behaviors re- 
ported were in the same category as 
the example given. 

SUMMARY AND CONCLUSIONS 

This review has described the de- 
velopment of a method of studying 
activity requirements called the criti- 
cal incident technique. The _ tech- 
nique grew out of studies carried out 
in the Aviation Psychology Program 
of the Army Air Forces in World 
War II. The success oftthe method 
in analyzing such activities as com- 
bat leadership and disorientation in 
pilots resulted in its extension and 
further development after the war. 
This developmental work has been 
carried out primarily at the American 
Institute for Research and the Uni- 
versity of Pittsburgh. The reports of 
this work are reviewed briefly. 

The five steps included in the crit- 
ical incident procedure as most 
commonly used at the present time 
are discussed. These are as follows: 
(a) Determination of the general aim 
of the activity. This general aim 
should be a brief statement obtained 
from the authorities in the field 
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which expresses in simple terms those 
objectives to which most people 
would agree. (5) Development of 
plans and specifications for collecting 
factual incidents regarding the ac- 
tivity. The instructions to the per- 
sons who are to report their observa- 
tions need to be as specific as possible 
with respect to the standards to be 
used in evaluating and classifying the 
behavior observed. (c) Collection of 
the data. The incident may be re- 
ported in an interview or written up 
by the observer himself. In either 
case it is essential that the reporting 
be objective and include all relevant 
details. (d) Analysis of the data. 
The purpose of this analysis is to 
summarize and describe the data in 
an efficient manner so that it can be 
effectively used for various practical 
purposes. It is not usually possible to 
obtain as much objectivity in this 
step as in the preceding one. (e) In- 
terpretation and reporting of the 
statement of the requirements of the 
activity. The possible biases and im- 
plications of decisions and procedures 
made in each of the four previous 
steps should be clearly reported. The 
research worker is responsible for 
pointing out not only the limitations 
but also the degree of credibility and 
the value of the final results obtained. 

It should be noted that the critical 
incident technique is very flexible and 
the underlying it have 
many types of applications. Its two 
basic principles may be summarized 
as follows: (a) reporting of facts re- 
garding behavior is preferable to the 
collection of interpretations, ratings, 
and opinions based on general im- 
pressions; (6) reporting should be 
limited to those behaviors which, 
according to competent observers, 
make a significant contribution to 
the activity. 


principles 
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It should be emphasized that criti- 
cal incidents represent only raw data 
and do not automatically provide 
However, a 
procedure which assists in collecting 
representative samples of data that 
are directly relevant to important 
problems such as establishing stand- 
ards, determining requirements, or 
evaluating results should have wide 
applicability. 

The applications of the critical 
incident technique which have been 
made to date are discussed under the 
following nine headings: (a) Meas- 
ures of typical performance (criteria) ; 
(b) measures of proficiency (standard 


solutions to problems. 


samples ; aa training; (d) selection 
and classification; (e) job design and 
purification; (f) proced- 
ures; (g) equipment design; (h) moti- 
and 
(7) counseling and psychotherapy. 

In summary, the critical incident 
technique, rather than collecting 
hunches, and estimates, 
obtains a record of specific behaviors 
from those in the best position to 
make the necessary observations and 
evaluations. The collection and tabu- 
lation of these observations make it 
formulate the critical 
requirements of an activity. A list of 
critical behaviors provides a sound 
basis for making inferences as to re- 


operating 


vation leadership (attitudes) ; 


opinions, 


possible to 


quirements in terms of aptitudes, 
training, and other characteristics. 
It is believed that progress has been 
made in the development of pro- 
cedures for determining activity re- 
quirements with objectivity and pre- 
terms of well-defined and 
general psye hologi« al 
Much remains to be done. It is 
hoped that the critical incident tech- 


nique and related developments will 


cision in 
categories. 


provide a stable foundation for pro- 


cedures in many areas of psychology. 
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It is customary to distinguish two 
main types of validity; these may be 
named higher-order (internal) and 
lower-order (external) validity. Inter- 
ial validation consists essentially in 
the correlation between 
different tests supposed to measure 
the same variable. Consideration of 
this method of validation will be re- 
served until later, since the tests to be 
evaluated below have not generally 
been validated in this way, and the 
use of this method is comparatively 
rare. Lower-order validity implies 
the external criterion 
against which the test is validated. 
In the development of a test of brain 
damage, for instance, the external cri- 
terion would consist of a number of 
clinical groups, such as a brain-dam- 
aged group of patients, a group of 
psychiatric (functional) patients 
without brain damage, and a group of 
normal controls; a valid test of brain 
damage would then be expected to 
distinguish patients in the brain- 
damaged group from those in the 
other two groups. The difficulty in 
using this method of validation lies, 
of course, in the fact that the cri- 
terion itself is in need of validation. 
It has been shown by Ash (5) and 
others that the reliability of psychi- 


measuring 


use of an 


' The writer expresses his appreciation to 
members of the psychological department of 
this hospital for helpful comments on this 
paper, which was read at a meeting of the 
Committee of Professional Psychologists at 
the Maudsley Hospital on May 9, 1953. The 
work reported in this paper was made possible 
by a grant from the Research Fund, made 
available from the endowment by the Board 
of Governors of the Bethlem Royal Hospital 
and the Maudsley Hospital. 


atric diagnosis is so low that it is dif- 
ficult to rely on such classifications in 
the development of tests. Even when 
very broad groups are used, the de- 
gree of error may be considerable. It 
seems necessary to use the classifica- 


tion system given above, however, 
because this is the procedure adopted 
by the authors of most of the tests to 
be discussed, because the groups may 


be more reliably discriminated than 
in most cases, and because, in the 
case of the brain-damaged group at 
least, independent confirmation may 
be forthcoming in the shape of neuro- 
logical signs, post-mortem examina- 
tion, etc. 


It is assumed, then, that it is possible to 
isolate such broad groups for the purposes of 
research; however, there remain other condi- 
tions which must be fulfilled before the valid- 
ity of a given test can be discussed. In general, 
these may be summarized as follows: 

a. The test should present data for ade- 
quate samples of the above-mentioned three 
groups (brain-damaged, functional, and nor- 
mal). 

6. The data should be presented in such a 
form as to enable the clinician using the test 
to estimate the degree of possible error when 
assigning a patient to any one of the groups. 
Such data may be stated in three different 
ways. First, the optimum cutoff point may be 
given; this is the point at which it is possible 
to identify as many brain-damaged patients 
as possible and at the same time misclassify 
as few functionals and normals as possible. 
(If the distributions are normal, this can be 
stated simply in terms of the mean and stand- 
ard deviation.) Second, the point beyond 
which no normals or functionals of the sam- 
ples fall should be given. Third, the point be- 
yond which no brain-damaged patient of the 
sample falls should also be given. 

c. The reliability of the results should be 
verified by applying the test to new groups 
that are independent of the original criterion 
groups. Alternatively (or, preferably, in addi- 
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tion), the findings should be confirmed by 
another worker in another hospital. 

d. The influence of various factors, such as 
and any special 


factors (such as visual acuity when perceptual 


age, sex, and intelligence, 


tests are used, and motor coordination when 
mechanical should 
trolled. 


tests are used) be con- 


It will be the purpose of this paper 
to show that most of the tests pur- 
porting to be measures of brain dam- 
age do not meet the conditions set out 
above and that, therefore, their valid- 
ity either cannot be considered as es- 
tablished or cannot even be evaluated. 
It may be pointed out, however, that, 
even if these conditions were ade- 
quately met, no general judgment 
can be made about the validity of a 
particular test. Whether a given test 
is considered to be a valid measure of 
brain damage, or whether it is not, 
will depend on many factors. The 
most important of these will be 
whether or not the test works in prac- 
tice. Because a test is usually stand- 


ardized on relatively pure groups, the 


discriminating power must neces- 
sarily drop when the test is used clini- 
cally. Again, precise confidence limits 
are difficult to apply. Thus, a test 
that identifies only 20 per cent of 
brain-damaged patients admitted to 
a hospital may be a very useful clini- 
cal instrument if these patients are 
not identifiable at the time of testing 
by any other means. On the other 
hand, a test that identifies 60 per cent 
of all brain-damaged patients may not 
be very useful clinically because most 
of the patients it identifies are obvious 
cases of brain damage and can be de- 
tected by simpler means. In some in- 
stances, it may be possible to use a 
cutoff point in such a way that a pa- 
tient falling above it is unequivocally 
identified as brain-damaged, while, if 
he falls below it, he (being brain-dam- 
aged) is not misclassified ; but the ques- 
tion remains an open one, no state- 
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ment being made about the patient. 
However, when considering the 
standardization of a test, one may 
reasonably demand a low percentage 
of misclassification, the 
groups are carefully chosen for their 
clinical differences. The various tests 
of brain damage will now be consid- 
ered to see how far they fulfill the cri- 
teria already laid down. 

The tests will be considered under 
two broad  headings—those 
which employ qualitative methods 
and those which employ quantitative 
methods. Those tests using quanti- 
tative methods will, in turn, be 
grouped into those utilizing the con- 
cept of deterioration and those meas- 
uring perceptual or motor functions. 
Such a division is, of course, quite 
arbitrary, and is dictated by conven- 
ience. 


because 


tests 


QUALITATIVE TESTS OF BRAIN 
DAMAGE 

The criteria laid down above pre- 
suppose adequate statistical treat- 
ment of data. However, such treat- 
ment is almost entirely lacking in one 
of the most widely used and reputable 
batteries of tests of brain damage 
the tests of abstract concept forma- 
tion developed by Goldstein and 
Scheerer (22) and their colleagues. 
These tests are so well known that it 
is unnecessary to describe them. The 
basic criticism to be made of these 
tests is not that they are invalid, but 
that there is no basis for a discussion 
of their validity. The tests are unique 
in diagnostic psychological testing in 
providing no quantitative data on the 
subjects used and providing no per- 
centage of incorrect diagnoses, in ig- 
noring the effects of age and intelli- 
gence level completely, and in assum- 
ing that the performance of normal 
people of average intelligence will be 
without error. Under these circum- 
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stances, it is clear that the validity of 
the tests is something private to each 
individual user. It is possible that, in 
the hands of a skilled clinician, the 
tests have a high validity; but, even in 
the case of Goldstein himself, no in- 
formation is available about the num- 
ber of times he has been in error. Two 
further points may be made: First, 
there has never been any clear agree- 
ment concerning what the tests meas- 
ure Goldstein believes that they 
measure the ability to abstract, and 
that brain injury leads to a defect in 
assuming the abstract attitude, which 
is revealed in test performance; on 
the other hand, Hutton (36) believes 
that failure on the Block Design test 
is due to overabstraction, not failure 
of abstraction. Second, it is signifi- 
cant that the Goldstein-Scheerer tests 
have been used to determine the pres- 
ence or schizophrenic 
thought disorder, and that other 
tests, developed specifically to test 
for schizophrenia, bear a strong re- 


absence of 


semblance to the Goldstein tests, e.g., 
the Hanfmann-Kasanin Test of Con- 
cept Formation. Little work has been 
done to differentiate between the per- 
formance of schizophrenics and brain- 
damaged patients on tests; 
hence, they are very difficult to apply 
clinically. 

That the necessary quantification 
would be possible may be easily in- 
ferred from examination of Gold- 
stein’s manual, where many possible 
dimensions of measurement are indi- 
cated in qualitative form. A begin- 
ning has, in fact, been made. Boyd 
(7), using 54 normal hospitalized 
subjects, set up quantitative norms 
for the Block Design test by giving 
weights to the various steps utilized 
by Goldstein. When he divided his 
group according to Wechsler IQ, he 
found that perfect performance would 
indeed be expected, provided the 1Q 


these 
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was 100 or more. As the 1Q dropped, 
however, so did the score on the 
Block Design test, from a mean score 
of 120 (SD=0) for IQ 100-109 toa 
mean score of only 99.5 (SD= 34.5) 
for 10 66-79. He was able to give 
limits within which a person of a 
given IQ level would be expected to 
fall. He thus demonstrated that in- 
telligence is an important factor in 
performance on these tests; he also 
found that psychotics tended to do 
worse than brain-damaged patients 
(though numbers very 
small). 

Lidz, Gay, and Tietze (41), using 
the system developed by 
Kohs, found that the Kohs test dis- 
criminated significantly between 21 
organics with deterioration, and 15 
nondeteriorated schizophrenics (with 
a misclassification of only 2/36). 
However, there was no control for 
age, the difference between the two 
groups being certainly significant. An- 
other attempt to quantify the Block 
Design test was reported by Armi- 
tage (4). Using three groups of sub- 
jects—normal, neurotic, and brain- 
damaged—he calculated the percent- 
age of controls and organics requiring 
assistance at each of four steps. He 
concluded that the quantitative re- 
sults were disappointing from the 
standpoint of a screening device; so, 
to determine whether the discrimina- 
tory ability of the test could be 
increased, he made a multiple ap- 
proach. This involved the computa- 
tion of such variables as the time 
taken to complete each design, the 
number of incorrect moves, the order 


his were 


scoring 


or sequence of placing the blocks, and 
the number of correct first block 
placements. None of these methods 
proved successful. 

Tooth (67) gave a series of tests, 
including the Kohs Block test and the 
Weigl Color Form Sorting Test, to 





362 


100 cooperative naval officers and 
ratings admitted to a naval hospital 
with a history of an injury to the 
head and a diagnosis of postconcus- 


As controls, he used 50 
convalescent patients in the surgical 
wards of the same hospital and 50 
neurotics carefully chosen from 2,000 
seen in a naval psychiatric 
clinic. After careful statistical analy- 
sis of the results, he concluded that 
the method did not give sufficient 
quantitative discrimination between 
head injury cases and normals to be 
of much practical importance in the 
the individual case. 
Furthermore, in these two tests, a 
difference of as great a magnitude was 
found between the normal controls on 
the one hand and a series of neurotic 
patients in whom no organic condi- 
tion was known to exist on the other. 

With respect to object sorting, 
Halstead (27), using groups of cases 
with lesions in various areas of the 
brain, compared them with a normal 
group on an _ object-sorting test, 
larger in content and presented under 
conditions somewhat different from 
those obtaining in the Goldstein test. 
The sorting behavior was analyzed 
according to a number of criteria, in- 
cluding the percentage of objects 
grouped, the total percentage of ob- 
jects recalled after five minutes, and 
the percentage of objects grouped 
that were recalled after the same in- 
terval. With respect to these three 
variables, for 11 normals and 11 
cases of frontal lobe injury there was 
no overlap in the scores of the two 
groups on the first and third vari- 
ables, and only two brain-damaged 
exceeded the lowest normal 
score on the second variable. On the 
other hand, cases with lesions in other 
parts of the brain showed consider- 
able overlap with normals. Hal- 
stead’s results confirmed to some 


sional state. 


cases 


assessment of 


cases 
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extent Goldstein’s hypothesis that 
performance as a whole on this test 
tends to be lowered in cases of frontal 
lobe lesion. He also offered some 
evidence for the hypothesis that pa- 
tients with injury to the frontal lobes 
cannot sort according to an abstract 
principle. In terms of the number of 
abstract groupings produced, there 
was again no overlap between the 
normals and the cases of frontal lobe 
injury. 

A rather radical and important de- 
parture from the usual procedure in 
the Block Design test has been made 
by Grassi (23, 24) in his Block Sub- 
stitution Test for measuring organic 
brain pathology. Essentially, the 
main innovations are that the pa- 
tient copies not a drawing but a set 
of blocks, and an attempt is made to 
measure both concrete and abstract 
reproduction at two levels. The test 
consists of five designs or models, 
each of which is reproduced by the 
patient at four levels of increasing 
complexity. In the simple concrete 
task, the patient copies only the top 
side of the model. In the simple ab- 
stract task, he copies the top side 
again, but this time using different 
colors from those of the model. In 
the complex concrete task, the pa- 
tient has to copy the model correctly 
with respect to all six sides, while in 
the complex abstract task, he copies 
the complete model again, but this 
time using different colors from those 
of the model. The scoring system is 
simple, accuracy and time being re- 
warded; the maximum score possible 
is 30. The standardization groups are 
unusually large for this kind of work, 
consisting of 86 normals, 86 schizo- 
phrenics (equally divided into de- 
teriorated and nondeteriorated 
cases), 72 organics, and 30 postlo- 
botomy cases. Grassi states that no 
influence of age, sex, or other factors 
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was found, while test-retest relia- 
bility was high (+.85). Means and 
ranges are given for the four groups 
and show that (using a cutoff point 
of 16) there was no overlap between 
the two organic and the two nonor- 
ganic groups. If this finding can be 
confirmed, it is clearly of great impor- 
tance. There is some doubt, however, 
about the accuracy of Grassi’s fig- 
ures. For example, although the 
ranges of the organic and nonorganic 
groups do not overlap, Grassi reports 
in another section that six schizo- 
phrenics scored below 16 and were 
not included in the standardization 
data; also, that eight organics scored 
“slightly above’’ 16 and were cor- 
rectly identified by qualitative exami- 
nation. However, the misclassifica- 
tion is sufficiently small to make fur- 
ther research on this test imperative. 

Grassi’s (24) attitude toward his 
own statistics is difficult to under- 
stand. He writes: “It cannot be too 
strongly stressed that test scores 
taken alone will lead to incorrect 
conclusions. The score is intended as 
a guide, not as the sole criterion for 
the final conclusion. Behavior cannot 
be too greatly emphasized and should 
be, in most cases, the basis for diag- 
Test and intellectual 
level are supporting factors. They 
must not, ever, be the sole basis for 
final classification” (24, p. 63, italics 
his). Two comments are rele- 
vant here. In the first place, for 
the standardization groups at least, 
since there is no overlap, it is hard to 
see how a quantitative score could 
lead to a false diagnosis of brain dam- 
age. Second, a subjective statement 
that the patient’s behavior was quali- 
tatively such and such is in fact a 
preliminary quantitative statement 
of behavior, albeit in a crude and 
inaccurate way. The aim of any test 
constructor must surely be to quan- 
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tify as soon as possible his initially 
crude and unrefined observations 
and thus set up proper control. Why 
Grassi, having done this, should then 
turn his back on it, is difficult to see. 
The consequences, however, are seri- 
ous. Although Grassi himself states 
that intelligence can influence per- 
formance, no adequate information is 
given. This would have been 
cially valuable in view of the rela- 
tively large numbers of subjects used. 
In spite of these criticisms, the Grassi 
test represents an important advance 
over the original Goldstein tests. 
Clinical use of the Color Form test 
of the Goldstein battery has sug- 
gested that it is too easy to be usetul 
in the detection of brain damage 
except in severe cases. ‘This was con- 
firmed by McFie and Piercy (43) who 
used 55 cases of known brain damage 
and scored the test simply as “‘pass”’ 
or “‘fail’’ on the basis of the patient's 
ability to sort the pieces both by form 
and by color. They found that only 
9 of the 55 patients failed the test. 
An important modification of the 
Color Form test has, however, been 
published by Scheerer (60). Three 
large model figures—a circle, a tri- 


espe- 


angle, and a square—are used, and 
there are 12 test figures, four being 
subsumed under the concept of 
roundness, four under the concept of 
triangularity, and four under that of 
squareness. The three large figures 
lie on the table as model figures. ‘The 
subject is shown each of the remain- 
ing figures in turn and asked to indi- 
cate with which model figure it be- 
longs. Each figure is removed from 
the subject’s sight before the next 
one is presented. A simple, objective 
system of scoring has been devised, 
failure being counted when the sub- 


ject incorrectly assigns a figure. If 


this happens, a series of three ‘“‘helps”’ 
is given, similar in nature to the suc- 
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cessive steps on the Block Design 
test. The test was standardized on 
four groups of subjects. ‘The first 
group consisted of 44 college stu- 
dents; the second group was made up 
of 20 noncollege students; the third 
group consisted of 20 brain-damaged 
patients; and the fourth group in- 
cluded 20 retarded high school pupils 
who were used as controls for those 
brain-damaged patients of low intelli- 
gence. Scheerer found that the per- 
centage of subjects needing the three 
“helps” fell rapidly after the first 
help for the three non-brain-damaged 
groups, but that all the brain-dam- 
aged patients needed the third help 
and 75 per cent of them failed all 
three helps; no subjects in the other 
three groups failed all three helps. 
This study is of importance, not 
only for its success in identifying 
brain-damaged patients, but in repre- 
senting one of the few serious at- 
tempts at quantification of the Gold- 
stein-Scheerer tests. However, some 
criticisms are possible. The instruc- 
tions contained in the original article 
are inadequate for clinical use—for 
instance, it is not made clear what 
happens if the subject makes more 
than one error after the first help has 
been given; the test objects are pre- 
sented in a fixed, not random, order, 
so that a pattern of response may be 
set up which masks inability to do the 
test. In this connection it may be 
noted that the figures given by 
Scheerer indicate that, compared 
with the noncollege group, fewer of 
the retarded group needed the third 
help, and that, compared with the 
college group, fewer retarded needed 
the second help. Again, the test has 
not been given to a group of func- 
tional patients, so the degree of over- 
lap between such a group and the 
brain-damaged group is not known. 
As far as is known, no subsequent in- 
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formation has been made available on 
this test, and no critical studies have 
been published. 

Zangwill (in Buros, 10) has criti- 
cized the Goldstein tests on the gen- 
eral grounds that they are qualitative; 
that some psychotics show impair- 
ment; that some brain-damaged pa- 
tients with traumatic injuries behave 
essentially normally on the tests; and 
that gross impairment on the tests, 
due to a specific lesion, may be 
present in the absence of any occupa- 
tional or social difficulties. It may be 
added that no information is avail- 
able concerning the factorial composi- 
tion of the tests. 


QUANTITATIVE TESTS OF 
BRAIN DAMAGE 
Tests Employing the Concept of Detert- 
oration 


The tests considered under this 
heading are based on the principle 
that brain damage leads to deteriora- 
tion of an irreversible nature. This 
deterioration usually is contrasted 
with that found in the functional dis- 
orders, wherein deterioration is con- 
sidered to be more apparent than 
real, and to be due to inattention, the 
assumption being that the patient 
retains normal ability if he could be 
made to perform at his best level. It 
is not proposed to discuss the concept 
of deterioration as such, and these 
tests will be criticized only in so far as 
they claim to distinguish brain-dam- 
aged patients from other groups. 

The Hunt-Minnesota Test for Or- 
ganic Brain Damage (34) consists of 
three major divisions—a vocabulary 
test, which is relatively insensitive to 
brain damage; a group of tests sensi- 
tive to deterioration; and a group of 
interpolated tests. The subject's 
Stanford-Binet vocabulary score, in 
relation to his age, determines the 
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score level at which he is expected to 
perform the more sensitive tests. The 
deterioration tests, consisting of pairs 
of words and of designs that the sub- 
ject is expected to associate and later 
recall and recognize, determine the 
level at which he is actually function- 
ing. The amount of discrepancy 
between the subject’s expected score 
and the score he actually makes on 
the word and design associations 
(when corrected for age) is the basis 
for the diagnosis of brain damage. 
The discrepancies are indicated by 
T scores; those T scores which fall 
higher than a certain critical point 
are considered to be indicative of 
brain damage. Originally, this criti- 
cal T score was fixed at 68; later it 
was reduced to 66. The test may be 
given in either a short or long form. 

Two groups of subjects were used 
in the development of the test: 33 pa- 
tients suffering from brain damage 
(all but three were cases of diffuse 
damage); and 41 controls, consisting 
of 15 neurotics, 11 normals, 6 psy- 
chotics, and 8 nonpsychiatric pa- 
tients. All subjects had to be abie to 
read and speak English, had attended 
school to third grade, were ade- 
quately cooperative and attentive, 
had adequate muscular coordination 
and sensory acuity, and were between 
the ages of 16 and 70. For discrimina- 
tive purposes, 25 persons equated for 
vocabulary and age were drawn from 
each group. Using a cutting point of 
68, Hunt found that only one of the 
50 standardization subjects was mis- 
classified, and that, of the remaining 
24 subjects, only three were mis- 
classified. (He ignored three subjects 
who scored 68.) 

Hunt’s use of the interpolated tests 
(tests of concentration and attention) 
was based on the following assump- 
tions: deteriorated brain-damaged pa- 
tients would fail the learning tests 
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but not the interpolated tests; func- 
tional, “apparently deteriorated” 
patients would succeed on both learn- 
ing and interpolated tests; functional, 
“genuinely deteriorated’’ patients 
would fail both learning and inter- 
polated tests. Implicitly, therefore, 
failure on the interpolated tests was 
to be utilized as a measure of whether 
a functional patient was really, or was 
only apparently, deteriorated. In 
fact, this ingenious hypothesis was 
not directly tested by Hunt; his only 
finding was that both of the groups 
he used were not differentiated by the 
interpolated tests. 

While, in its construction and 
standardization, this test fulfills to 
some extent the first two criteria laid 
down earlier in this paper, the results 
reported by Hunt have not been 


confirmed by other workers, while 
important reservations in the use of 
the Hunt-Minnesota vocabulary test 


are necessary. 

The discrimination between nor- 
mals and organics has not been found 
by other investigators. Thus, Aita 
and his associates (1), using groups 
already referred to in the study by 
Armitage (4), found that 10 per cent 
of their control subjects obtained a 
T score greater than 90, but that only 
2.2 per cent of their brain-damaged 
subjects obtained a score as high as 
this. On the other hand, 41.3 per 
cent of their brain-damaged subjects 
obtained a score that was in the nor- 
mal range, i.e., below 70. Many of the 
brain-damaged patients actually 
scored well below the cutoff point 
used by Hunt, 23.9 per cent of them 
falling within the range 50-54. When 
the brain-damaged group was di- 
vided into three categories— severe, 
moderate, and mild—only the mean 
T score of the severe group was sig- 
nificantly different from that of the 
controls. 
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A study by Canter (11) reinforced 
this finding that the amount of over- 
lap between a brain-damaged group 
and a normal group may be consider- 
able. For 47 arteriosclerotic patients 
the mean T was 69.04 (SD 
=12.49)—that is, the group mean 
fell within normal limits. Malamud 
(45), in a preliminary study, found 
that six out of ten members of the 
psychology department at her hos- 
pital were suffering from brain dam- 
age according to results on this test! 
She therefore administered the test 
to 64 subjects who satisfied Hunt's 
basic criteria. She found that 54.7 
per cent of these normal subjects ob- 
tained scores indicating organic brain 
damage. The exact critical score used 
did not influence the results appreci- 
ably. When all vocabulary scores 
above 33 words or more were arbi- 
trarily reduced to 32, no less than 
42.2 per cent of the total group still 
fell within the pathological range. 

In addition to this overlap between 
organics and normals, it has been 
shown by Meehl and Jeffery (47) 
that some functionals at least may 
show pathological scores on this test. 
They gave the test to a group of 15 
patients with functional depressions, 
of whom 9 were psychotic and 6 
neurotic. The possibility of brain 
damage was ruled out in all cases, and 
11 of the subjects did not fail any of 
the interpolated tests. The mean 
T score of the entire group was 70.2 
(SD=15.41). Using a critical score 
of 70, they found that one in four 
functionally depressed patients would 
obtain an organic score; using a cut- 
off point of 66, the ratio would be one 
in three. 

We have already seen that Mala- 
mud found that a high vocabulary 
score did not markedly affect the pro- 
portion of normals showing a patho- 
However, a 


score 


logical score on this test. 
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study by Juckem and Wold (37) 
showed that when the test was given 
to a group of 50 college students who 
reached Superior Adult III level on 
the Binet vocabulary test, they ob- 
tained a mean T 69.6, 
whereas Hunt's control-group mean 
was 50. Only 3 per cent of Hunt's 
control group obtained a score greater 
than this mean score. Of this college 
population, no less than 60 per cent 
had scores above Hunt's critical 
point. Juckem and Wold conclude 
that the test yields far too many 
false positives among persons of high 
vocabulary level. If a normal T score 
were to be obtained, the vocabulary 
level would have to be reduced to 21 
words. 

These studies show that, in prac- 
tice, the test has not lived up to the 
claims made for it by its author, and 
its usefulness as a test of brain dam- 
age is very much in doubt. 

The Shipley-Hartford Retreat 
Scale (64) is a short test of two parts, 
one consisting of a vocabulary test, 
which is supposed to hold up with 
age or illness; the other, of a set of 
abstract reasoning problems, sup- 
posedly sensitive to deterioration. A 
Conceptual Quotient (CQ) is derived 
from scores on the two parts of the 
test and may be defined roughly as 
Mental Age (Abstractions Test) 
/Mental Age (Vocabulary Test) 
«100. Mental age norms were estab- 
lished on 1,046 young subjects who 
had been given a group intelligence 
test. It was assumed that normal 
persons, regardless of native intelli- 
gence, would approach CQ’s of 100. 
This assumption proved to be roughly 
true within normal limits. A CQ 
above 90 was considered normal; one 
between 76 and 89, suspicious; and a 
CQ below 75, pathological. The test 
was validated on 171 state hospital 
cases and 203 private hospital cases. 


score of 
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Shipley and Burlingame (65) claimed 
that the test would discriminate be- 
tween normals, schizophrenics, and 
organics, and it is for this reason that 
the test is included here. They found 
that organics did worst, that neu- 
rotics and normals fell close together, 
and that the psychotics came in be- 
tween. 

This test may be criticized severely 
on many grounds. The standardiza- 
tion data were completely inade- 
quate, only young intelligent normals 
being used in the construction of the 
scale. There was no control for age, 
sex, or intelligence, yet the factor of 
age is of particular importance in this 
test. Subsequent research has shown 
beyond doubt that the test does not 
discriminate between organic and 
other groups without considerable 
overlap. 

Thus, Aita and his colleagues (1) 
found that, of a control group of 61 
normal subjects, 47 per cent 
tained a CQ that was suspicious or 
pathological, and 26 per cent of these 
61 subjects obtained a_ definitely 
pathological score. Of a brain-dam- 
aged group of 70 patients, 33 per cent 
fell within the normal range of scores. 
Contrary to Shipley’s assertion, they 
found that mild or moderate neurosis 
lowers the CQ, although they present 
no figures. Canter (11) used the 
Shipley-Hartford scale on 47 cases of 
multiple and obtained a 
mean CQO score of 85.8 (SD = 14.31). 
Despite the unfavorable difference 
in age between his group (mean age, 
32) and the standardization group of 
Shipley, he found that 50 per cent of 
his cases obtained CQ’s greater than 
76, and 40 per cent obtained CQ’s 
greater than 90, indicating considera- 
ble overlap with normals. 

Magaret and Simpson (44) gave 
the test to 50 patients consisting of 
psychotics and neurotics, aged 40-49, 


ob- 


sclerosis 
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and found a mean CQ of 74 (SD= 
12.7), with a range of scores from 55 
to 105. Of the 50 cases, 29 obtained 
a CQ lower than 70. Here again, al- 
though age would be expected to 
work in favor of lower scores, the 
overlap with normals would still be 
considerable, while the overlap with 
organics would be even greater. Fur- 
thermore, correlations of the CO with 
psychiatric ratings of deterioration 
and with the Wechsler deterioration 
index were not significantly greater 
than zero. ‘These results are the 
more remarkable in that Garfield 
and Fey (21) showed that, in fact, 
the CQ did decline steeply, solely asa 
function of age. Their results other- 
wise agree with Magaret 
and Simpson: using a group of 100 
patients ($8 psychotic, 37 neurotic, 
13 unclassified), they found a mean 
CO of 79.1 for the psychotics and 
85.2 for the neurotics. 

A further study by Manson and 
Grayson (46) was even more dis- 
turbing. Of Shipley’s standardiza- 
tion group, 26 per cent obtained a 
CQ below 90. Of a group of 1,262 
military prisoners examined by these 
authors, no less than 59.1 per cent 
obtained a CQ less than 90; 34.6 per 
cent a CQ less than 80; and 14.2 per 
cent a CQ less than 70; the mean CQ 
for the group was only 88.8. They 
suggested that the reason for this 
discrepancy was a spuriously high 
vocabulary score, and showed that 
substitution of the Army General 
Classification Scores for the vocabu- 
lary scores produced figures very 
similar to those of Shipley. This con- 
firms the previous findings that spu 
riously organic scores may be obtained 
on tests which use vocabulary level 


those of 


as a measure of previous intellectual 


functioning. Manson and Grayson 
also report that 45 per cent of severe 
neurotics scored below 80 on this 
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test. This was confirmed by the 
work of Kobler (39), who reported 
that, of 500 neurotics at an army re- 
habilitation center for combat fa- 
tigue, 78 per cent made CQ’s below 
90; 26 per cent had CQ’s below 70 
and thus fell within the organic range. 

Fleming (20) reported that 20 de- 
pressives obtained a mean CQ of 
only 73.2. However, the mean age of 
his group was 57.6 years. Ross and 
McNaughten (56) used 90 subjects 
with head injury and found that the 
results bore no relation to severity of 
the injury or to EEG or pneumoen- 
cephalographic evidence of cerebral 
damage except in cases of severe in- 
jury. No statistical data were given, 
however. 

How far the original faulty stand- 
ardization of the test and lack of care 
in its construction have been respon- 
sible for the failure of subsequent work- 
ers to establish the validity of the test 
is difficult to estimate. The findings 


quoted, however, certainly indicate 
that the test has not been adequately 
validated. 

There have been a number of at- 
tempts to develop indices of dete- 
rioration by using the Wechsler sub- 


tests in various combinations. The 
rationale of these tests is well known. 
A ratio is calculated between those 
tests which are said to hold up with 
age, and those which do not hold up 
with age. The underlying assump- 
tion is the rather curious one that or- 
ganic deterioration is similar to the 
deterioration accompanying age, dif- 
fering only in its early onset. Wechs- 
ler (68) found that the index dis- 
criminated between young normals 
and young brain-damaged patients, 
the overlap, however, 
being high. ‘Thus, there was a re- 
striction on the usefulness of the test 
right from the start. Levi, Oppen- 
heim, and Wechsler (40) specifically 
claimed that the DJ was useful, not 


percentage 
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only in confirming, but also in dis- 
covering, organic conditions. They 
claimed that it would assist in differ- 
entiating organic memory impair- 
ment from hysterical amnesia; in 
finding corroborative evidence of or- 
ganic involvement where clinical and 
neurological data are not clear-cut; 
in distinguishing between mental 
deficiency and mental deterioration; 
and in differentiating between psy- 
chosis with and without organic de- 
terioration. With the exception of 
the first, these claims have been 
tested and invariably have been 
found wanting; and it seems safe to 
say that the first has not been found 
wanting only because it has not been 
tested. 

Two important modifications of 
the Wechsler DJ are Reynell’s index 
(52), which makes use only of the 
verbal subtests, and Hewson's de- 
viation ratios (31), which are based 
on various combinations of the sub- 
tests. Reynell’s index, however, has 
been incorrectly criticized by some 
workers—its intention was not to 
discriminate brain-damaged cases 
from others, but to discriminate those 
brain-damaged cases with deteriora- 
tion from those without (and hence 
assumed initial knowledge of brain 
damage). 

Many studies have now been pub- 
lished on the DJ. This paper is con- 
cerned only with those investigating 
the validity of the tests as a measure 
of brain damage. At least six studies 
concur in reporting unfavorably on 
its use. Thus Gutman (25), using 30 
organics and 30 controls, found that 
the Wechsler D/ correctly identified 
only 43 per cent of organics, the 
Reynell index 50 per cent, and the 
Hewson ratios 60 per cent; whereas 
the DJ misclassified 33 per cent of the 
normal group, the Reynell index 30 
per cent, and the Hewson ratios 17 per 
cent. The three measures agreed in 
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the diagnosis of brain damage in only 
33 per cent of the cases. Five cases 
of clinically verified brain damage 
did not fall in the organic range on 
any of the tests. Allen (2), using as 
his criterion of deficit a loss greater 
than 20 per cent, found that the 
Wechsler D/ definitely screened out 
only 54 per cent of the total study 
group of 50 patients. Rogers (53) 
evaluated the D/ for seven groups 
(349 subjects) and found that, using 
a cutting score of 10 per cent, 75 per 
cent of subjects will be correctly 
identified, provided that only the 
brain-damaged and normal groups 
are used; but that, when other clin- 
ical groups are included, the results 


are no better than chance. Andersen 


(3), using 55 male soldiers with defi- 
nite clinical evidence of brain dam- 
age, showed that, when a cutting 
score of 10 per cent was used, nearly 
one-third of the total. sample fell out- 
side the organic range; yet when a 


cutting score of 20 per cent was used, 
nearly two-thirds of the patients fell 
inside the normal range. He divided 
his group of patients into those suf- 
fering predominantly from injury to 
the dominant hemisphere and those 
suffering from injury to the nondom- 
inant hemisphere. This did not ma- 
terially improve the results. Kass 
(38) gave the test to 18 cases with 
known organic damage and 12 cases 
of dubious organic diagnosis, and 
concluded that the D/ failed both in 
detecting and confirming the presence 
of organic conditions resulting largely 
from traumatic brain injury. As a 
percentage-loss method for express- 
ing psychological deficit, it was found 
inapplicable in two-thirds of his 
cases. Diers and Brown (15), using 
25 cases of multiple sclerosis, con- 
cluded that the DJ was not sensitive 
enough to be used clinically. Garfield 
and Fey (21) found that an equal 
number of psychotic and nonpsy- 
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chotic patients obtained pathologi- 
cally high DI’s, suggesting that the 
overlap between organics and func- 
tionals would be quite high. Magaret 
and Simpson (44) found that the DJ 
rating did not correlate with the psy- 
chiatrist’s ratings of degree of dete- 
rioration. The only study so far to 
produce reasonably favorable results 
with the DJ was that by McFie and 
Piercy (43). Using 56 brain-damaged 
patients and a cutting score of 10 
per cent, they were able to identify 
43 (71 per cent) of them; using a cut- 
ting score of 20 per cent, they iden- 
tified 37 (66 per cent). No functional 
patients were tested. In view of the 
unfavorable results summarized 
above, it seems clear that indices of 
deterioration are of little clinical use 
in their present form 

Halstead (28), using factorial an- 
alysis and various systems of weight- 
ing, developed a battery of tests 
which discriminated at a high level 
of confidence between normals and 
patients with lesions of the frontal 
lobes. The ten tests having the high- 
est t value were selected as the basis 
of an impairment index. In this ar- 
rangement, an individual whose 
scores fell below the criterion scores 
on all ten of the key tests had an im- 
pairment index of 0.0; while, on a 
simple proportion basis, an individ- 
ual who satisfied the criterion 
score on three of ten key tests had an 
impairment index of 0.3; or on all of 
the key tests, an index of 1.0. Using 
a cutting score of three, he was able 
to identify all 27 cases of frontal lobe 
injury and 29 out of 30 normals. The 
impairment index did not discrimi- 
nate between normals and other cases 
of brain damage. 

Of the ten tests, the Halstead 
Category Test (28, 29), involving the 
ability of the subject to “abstract” 
various organizing principles such as 
“size,"’ “‘shape,”’ ‘“‘color,” etc. from a 
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series of 336 stimulus figures pre- 
sented visually and serially by means 
of a multiple-choice projection ap- 
paratus, proved particularly success- 
ful. Using a cutting score of 70, he 
correctly identified 27 out of 29 nor- 
mals and 10 out of 11 cases with fron- 
tal lobe injuries. 

When, therefore, the patient its 
known on other grounds to be neither 
psychotic nor neurotic, this battery 
of tests offers a very accurate indica- 
tion of whether or not the lesion is 
situated in the frontal lobes. The im- 
pairment index was validated on a 
group different from the standardiza- 
tion group and was repeated on an in- 


dependent group. The only obvious 


objection to the index is the inade- 
quate representation of groups other 


than normals or brain-damaged. 


Perceptual and Motor Tests of Brain 
Damage 


Many attempts have been made 
to use the Rorschach test for the 
diagnosis of brain damage. The ap- 
proach has usually been made in 
terms of signs specific to brain dam- 
age. The approach is basically simi- 
lar to that used in tests employing 
the concept of deterioration, for the 
signs are taken to indicate inadequate 
or lowered performance. One of the 
earliest approaches was made by 
Piotrowski (49, 50), who used 33 
records, consisting of 18 brain-dam- 
aged cases, 10 cases with noncerebral 
disturbance of the central nervous 
and 5 cases of conversion 
hysteria. Ten signs were selected as 
differentiating between the three 
groups. These signs were very care- 
fully defined by Piotrowski. The or- 
ganic group produced a mean of 6.2 
signs; the other groups, a mean of 
1.5 signs; there was no overlap be- 
tween the groups. Thus, it was con- 
sidered that the presence of five or 
more of the Piotrowski signs was in- 


system, 
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dicative of brain damage. In a later 
paper (51), Piotrowski showed that 
the number of signs produced was a 
function, in part, of the severity of 
the personality changes produced by 
the disorder, and that the number of 
signs also increased with age. He 
also pointed out that some of his 
signs were produced by schizophren- 
ics and neurotics. He claimed that, 
out of 56 patients producing five or 
more signs, 55 were in fact organic; 
of 25 patients originally considered, 
but later rejected as such by neuro- 
logical tests, only one produced an 
organic record. 

Ross (54) tried to repeat the find- 
ings of Piotrowski. He used two 
groups which closely approximated 
those used by Piotrowski. He also 
tested several other groups and found 
that the signs occurred highly signifi- 
cantly more often in the group with 
cerebral lesions, and _ significantly 
more often in the epileptics, than 
in all the other groups together. How- 
ever, although five or more of these 
signs were found most often in pa- 
tients with disease of the cerebral 
cortex and subcortical tissue, they 
were not specific for these lesions. 
Thus, 55 per cent of brain-damaged 
patients showed five or more signs, 
but so did 30 per cent with noncor- 
tical lesions of the central nervous 
system, and also 20 per cent of the 
psychotics and 14 per cent of the 
neurotics. Furthermore, Ross showed 
that 50 per cent of the cortical cases 
and 53 per cent of the epileptic cases 
showed five or more of the Harrower- 
Erickson signs of neurosis. 

In two later papers (55, 57), Ross 
divided the signs into four groups: 
those common to neurotic and or- 
ganic patients; the neurotic differen- 
tial signs; the organic differential 
signs; and the organic excluding 
signs. Each sign was then weighted 
in rough proportion to its differential 
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incidence in the groups of individuals 
being compared. The four sets of 
scores were then combined for each 
individual to give two ratings. These 
two ratings were called the “‘instabil- 
ity’ and “disability’’ ratings, and 
were standardized on 50 neurotics, 24 
organics, 50 superior, and 50 average 
normals. Thisrepresented an advance 
on Piotrowski’s method, in that a pa- 
tient could be given a rating on both 
the organic and the neurotic dimen- 
sions. 

The next important contribution 
came from Hughes (32, 33), who 
derived 14 signs from a factor an- 
alysis of the 22 signs he found in the 
literature on brain damage. Dziffer- 
ent weights were assigned to these 
signs according to their discrimina- 
tory power. He used 218 subjects, 
including 50 with brain damage, 68 
schizophrenics, 74 neurotics, 4 manic- 
depressives, and 22 normals. The 
point-biserial correlation between 


presence or absence of brain damage 
and score on the sign pattern was 
+.79, which was highly significant. 
When a cutting score of seven or 
above was used, 82 per cent of the 


organics were correctly identified, 
while only 1 per cent of nonorganics 
were falsely identified as organic. 
Using Piotrowski’s signs, Hughes 
could correctly identify only 20 per 
cent of the organics without includ- 
ing Many nonorganics. 

However, these encouraging re- 
sults were shown to have a serious 
flaw by the recent work of Diers and 
Brown (16). They took a group of 
25 multiple sclerotics and divided 
them into two groups, 15 with an 
1Q above 102, and 10 with an IQ be- 
low 102. This latter group was care- 
fully matched with 11 nonorganic 
patients with IQ’s below 102. The 
three groups were then tested and the 
results analyzed for Hughes’s signs. 
The mean number of signs for the 
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more intelligent organics was only 
0.33, none obtaining a score above 
seven, and ten obtaining a zero or 
negative score. As opposed to this 
the organics of average or dull 10 ob- 
tained a mean of 5.2 signs, and the 
nonorganics of average or dull IQ 
obtained a mean of 4.18 signs. None 
of the patients in these groups ob- 
tained a negative score. Diers and 
Brown concluded that there is an 
inverse relationship between IQ meas- 
urement on the Wechsler and the 
weighted Hughes score, independent 
of the factor of intracranial pathol- 
ogy. The Hughes signs are, there- 
fore, invalid, unless obtained with a 
patient of high intelligence—and 
the study showed that, in fact, none 
of the intelligent organics used by 
Diers and Brown fell within the or- 
ganic range. 

A new approach was made to the 
problem by Dérken and Kral (17), 
who, instead of asking what signs 
the organic patient would show on 
the Rorschach, investigated what 
signs he would be likely not to show. 
Seven signs were determined, and 
each sign was weighted according to 
degree of occurrence in organic or 
nonorganic states. By this means, 
a total possible score of ten was de- 
termined. In terms of the frequency 
distribution, the cutting point was 
fixed between two and three; that is, 
scores from three to ten, inclusive, 
exclude a diagnosis of brain damage. 
Using this method, they claimed that 
92.9 per cent of organics were identi- 
fied, and 83.3 per cent of nonorgan- 
ics were identified as such. If Pio- 
trowski’s signs were used however, 
only 50 per cent of the organics would 
be identified; while use of the Ross 
“disability” ratio identified 75 per 
cent of the organics. 

One additional study may be men- 
tioned here. Buhler, Buhler, and Le- 
fever (8), using 30 normals, 70 neu- 
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rotics, 50 psychopaths, 27 schizo- 
phrenics, and 30 organics, developed 
a Basic Rorschach Score. They 
claimed that this score was capable 
of separating clinical groups in a 
statistically reliable manner. But 
they also stated that 


the variability of scores within each clinical 
group indicates that the Rorschach 
Score alone is not a sufficient basis for indi- 
The placement of the in- 
dividual case on a scale of adjustment or ego- 


asic 
vidual diagnosis. 


integration does, however, appear to be a 


highly probable outcome of the use of the 
Basic Rorschach Score (8, p. 161). 


In other words, the technique is de- 
signed, not to discriminate between 
organic and other groups as such, 
but rather to give an estimate of the 
degree of mental illness—a procedure 
which is based, apparently, on the 
theory that persons vary on a single 
dimension from normal through neu- 
rotic to psychotic, and which has 
been criticized by Eysenck (18, 19). 
A more recent monograph by Buhler, 
Lefever, Kallstedt, and Peak (9) con- 
firmed the results obtained in the 
initial study. 

The major criticisms to be made 
of the Rorschach as a test of brain 
damage seem to be that in all the 
scoring systems, except that of Dér- 
ken and Kral, the greatest weighting 
is given to those factors that are the 
most difficult to score and that de- 
pend, therefore, to the highest ex- 
tent on the subjective evaluation of 
the examiner. Second, although all 
authors using the sign approach seem 
to obtain adequate differentiation of 
groups for their particular study, the 
differentiating power invariably 
drops considerably when these meth- 
ods are repeated by other workers. 
Thus, where Piotrowski claimed pow- 
erful discrimination for his signs, 
Dérken and Kral found that they 
identified only 50 per cent of their 


organic group; and other groups of 
signs used by Ross declined to an effi- 
ciency of 75 per cent. Third, most of 
the studies have ignored the influ- 
ence of age and intelligence; the 
classic example of this is, of course, 
the work of Hughes, which at first 
sight seemed so promising. We must 
conclude, therefore, that, while most 
workers in this field report satisfac- 
tory discrimination, the constant “dog 
eat dog’’ method by which one set 
of signs is set aside as unusable and 
replaced by a new set by subsequent 
workers does not inspire confidence 
in the validity of the most recent 
method, that of Dérken and Kral. 
That the Rorschach offers distinct 
promise in this problem cannot be 
denied; that it has been shown to be 
a satisfactory test of brain damage is 
open to question. 

Several tests have been recently 
constructed which are not very well 
known. Two of these are described 
in the monograph by Armitage (4). 
In their original form, these tests are 
known as the Trail Making Test and 
the Patch Test. The Trail Making 
Test consists of two parts. Part A 
consists of a sheet of paper on which 
there is printed a series of circles. In 
the center of each of these is a num- 
ber, with a range for the test proper 
of 1-25, and for the sample, which is 
on the opposite side of the page, of 
1-7. These circles are spatially ar- 
ranged in a random order. The pa- 
tient is required to draw a line be- 
tween the circles, starting at number 
1 and finishing at number 25. He is 
asked to work as rapidly as possible 
and to erase errors. 


Part B is very 
similar, but more complex, in that 
the numbered circles are mixed with 
circles lettered from A onwards. The 
task is to draw a line from circle one 
to circle A; then to go to circle two, 
then to circle B, and so on. 


Scoring 
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is in terms of time and accuracy. If 
no errors are made, or if an error 
is corrected very quickly, the score 
obtained depends directly on the 
speed with which the test is com- 
pleted. If the item is accomplished in 
less than 20 seconds, a maximum 
score of ten is given; however, if 
longer times are taken, partial cred- 
its are given. The Patch Test re- 
quires the duplication of nine col- 
ored circular patterns, one serving as 
a demonstration design. For every 
pattern, the materials provided con- 
sist of 19 paper circles of different 
colors. Some of these are solid, while 
others have different sections of the 
interior part removed so that, by 
placing them on top of one another, a 
pattern may be formed. The test 
materials are arranged in front of the 
patient in a standard order. The 
eight test designs are in graded ofder 
of difficulty, and the test is termi- 
nated after two failures. Each design, 


with one exception, can be duplicated 


only by putting together specific 
pieces out of the 19 provided. In the 
standardization of these tests, Armi- 
tage used as subjects 44 patients 
known to have sustained brain dam- 
age (9 focal, 17 focal-diffuse, 18 dif- 
fuse). The control group consisted of 
45 normal and 16 mild 
neurotics. The groups were consid- 
ered roughly comparable as to age, 
level of education, and 
occupation. 

It was found that Trail Making 
Test A, the simplest of the tests, 
discriminated the brain-damaged 
best from the two control 
groups. With a cutting score of 10, 
the total misclassification was 16 out 
of 95, 32 of the 44 organics being 
positively identified. Trail Making 
Test B identified 39 out of 44 organ- 
but misclassified 15 out of 51 
normals; and a combined score was 


subjects 


pre-injury 


group 


ics, 
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not more effective. As a clinical in- 
strument, the Patch Test identified 
26 out of 43 organics, misclassif ying 
7 out of 51 controls. The tests would 
appear to be useful ones and merit 
further study. Unfortunately, no 
data are given on the performance 
of psychotic patients, and the study 
has not been repeated. 

The genesis and construction of 
the Block Design Rotation Test has 
been described in three articles by 
Shapiro (61, 62, 63). This test re- 
sulted from the observation that 
some patients, while doing the Gold- 
stein block design test, left the com- 
pleted design in a rotated position 
compared with the test figure. This 
rotation might be as great as 45°, but 
rarely exceeded it. Various hypothe- 
ses were examined and it was found 
that the rotation effect could be 
maximized when three factors were 
interrelated in a certain way 
namely, the factors of figure shape, 
ground shape, and angle of the line 
of symmetry (the line of symmetry 
being defined as the line which di- 
vided a design into mirrored halves). 
Thus, rotation was found to be max- 
imal when the figure and ground 
shapes were in a diamond orientation 
and the line of symmetry was at an 
angle. 

From various theoretical and em- 
pirical considerations, it was de- 
duced that the rotation effect would 
be maximized under these conditions 
in brain-damaged patients. Accord- 
ingly, a special set of cards was de- 
vised, 40 in ali. Each design could be 
reproduced by using four Kohs blocks. 
Two groups of subjects, 19 brain- 
damaged and 19 psychiatric patients 
(the latter without brain damage), 
were carefully matched for age and 
sex, and given the test under iden- 
tical 
cording to 


Results were ac- 
prediction, the brain- 


conditions. 
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damaged patients rotating an aver- 
age of 8° per card, while the func- 
tional patients rotated an average of 
only about 2° per card. Considering 
the matched pairs together, every 
brain-damaged patient rotated more 
than the corresponding functional 
patient, with one exception—and 
this patient’s diagnosis was later 
changed independently to one of 
compensation neurosis. With a cut- 
ting score of 6° rotation per card, the 
test identified 14 out of the 19 brain- 
damaged patients, but misclassified 
only 1 functional patient. Nearly all 
the functional patients rotated a 
total of less than 200° for the 40 cards, 
and only one rotated more than 
250°. Many brain-damaged pa- 
tients, on the other hand, rotated 
more than 400° or 500° for the 40 
cards, and much higher totals have 
been encountered in clinical practice. 

When a further group of 19 brain- 
damaged patients (this time consist- 
ing of patients who had actually been 


operated) was tested, a closely simi- 


lar pattern of results was found. 
Data on several normal control 
groups showed that the test also dis- 
criminated between normals and 
brain-damaged patients, and revealed 
the curious and unexpected fact that, 
as a group, the controls rotated sig- 
nificantly more than the functionals 
(though this was significant at the 
.O5 level only). The effects of age, 
sex, and intelligence have also been 
calculated and will be published 
shortly. Thus, the rotation test satis- 
fies the criteria laid down in this 
paper in that it provides data on at 
least three groups of subjects, it takes 
into account factors such as intelli- 
gence, age, and sex, and its results 
have subsequently been confirmed on 
new, independent groups. It is also 
an entirely objectively scored test, 
the amount of rotation being re- 
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corded (in the later studies) photo- 
graphically, with reference to a con- 
stant base line. 

Using the original two groups of 
subjects, Shapiro (unpublished data) 
also found that the manual dexterity 
test and the finger dexterity test of 
the U.S. Employment Service battery 
of tests discriminated very signifi- 
cantly between brain-damaged and 
functional patients. The former test 
identified 14 out of the 19 brain-dam- 
aged patients and misclassilied 3 out 
of the 19 functionals; the latter iden- 
tified 12 out of the 19 brain-damaged 
patients and misclassified 5 out of the 
19 functionals. Used together, the 
tests successfully ideiitified 16 of the 
brain-damaged patients. 

DIscUSSION 

It is doubtful whether any aspect 
of psychological testing has been 
more inadequately treated than the 
diagnostic assessment of brain dam- 
age. From a wide range of possible 
criticisms, some of the most obvious 
will be cited: 

a. In many instances, validation studies 
have not used comparable groups. For exam- 
ple, the majority of cases in Hunt's (34) brain- 
damaged group were paretics and others suf- 
fering from diffuse brain damage. In his vali- 
dation study, Armitage (4), on the other hand, 
included many patients suffering from trau- 
matic head injury as the result of penetrating 
rhere is no reason to suppose that 
these two groups were at all comparable with 


missiles. 


regard to type, locus, or severity of brain 
damage. Armitage’s procedure may be justi- 
fied, however, in so far as Hunt described his 
tests as a measure of brain damage without 
further qualification. 

b. Few authors, when constructing their 
tests, choose their cases with sufficient care, 
in most instances considering brain damage as 
a unitary factor. That this is unsound from 
the point of view of diagnosis may be shown 
by evidence from many sources. Thus, if a 
random group of brain-damaged patients is 
given a test, the results nearly always fall 
into an abnormal distribution (i.e., are 
skewed). Many of the brain-damaged group 
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behave on a given test like normal controls or 
functional patients, while others obtain very 
abnormal Frequently, this sort of 
pattern accounts for the significance of dif- 
ferences between the groups. This kind of 
distribution was encountered in the Block De- 
sign Rotation Test and was recently seen ina 
study by Battersby, Teuber, and Bender (6) 
on the behavior of brain-damaged patients in 
a problem-solving situation. From an ana- 
tomical and physiological standpoint, there 
is likewise no reason why all brain-damaged 
patients should belong together. As Penfield 
and Evans (48) pointed out, there is a wealth 
of difference between the brain damage result- 
ing from scar formation on the temporal lobe 
following an accident, and the scar formation 
resulting from a temporal lobectomy. Simi- 
larly, there is no reason to suppose that a 
leucotomy operation will have the same effect 
as diffuse brain damage. 

c. Many investigators neglect almost com- 
pletely the elementary necessity for evaluat- 
ing and controlling various relevant factors 
such as age, sex, and intelligence. The Ship- 
ley-Hartford Retreat Scale, the Rorschach, 
and the Goldstein test are noteworthy exam- 
ples where failure to do this has led to am- 
biguity in the clinical use of the test. That 
factors as these cannot be neglected 
without serious risk of error is apparent from 
an article by Hebb (30). Using simple pat- 
terns that had to be reproduced with pieces 
of wood, Hebb found that “no pattern could 
be devised, which was so easy that all patients 
in the public wards of a general hospital could 
succeed with it in one minute, even though 
other tests showed that one was not dealing 
with a population of mental defectives" (30, 
p. 16). Hebb concludes that, although this 
kind of material tends to be eliminated in tests 
which are adequately standardized, ‘in spe- 
cial tests which have not been standardized, 
there is a real danger of assuming that a varia- 
tion from the norm, which is frequently ob- 
tained for the normal population, can be due 
only to the effects of cerebral injury” (30 
p. 17). In addition, many authors fail to 
state whether or not their subjects were tested 


scores. 


such 


for specific defects, even when this is clearly 
necessary. The Trail Making and Patch tests, 
for instance, might well be performed badly 
by normal persons with defective eyesight. 
These tests might be peculiarly successful in 
identifying soldiers suffering from penetrating 
wounds as brain-damaged, and be completely 
unsuccessful in cases of diffuse brain damage. 

d. Many tests employ very unreliable and 
subjective scoring systems. This applies par 
ticularly to the Rorschach, where most weight 
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is given to factors such as impotence, per- 
plexity, etc., but also to the Hunt-Minnesota 
Test, where, for example, more credit is given 
for a response within one-half of a second than 
for a response within one-half to one second. 

e. Occasionally, there is failure to realize 
that a test may discriminate between two 
groups statistically at a high level of signifi- 
cance, and yet still be unusable clinically be- 
cause the misclassification would be very high. 

f. In many tests, there is failure to control 
relevant variables. Thus, in the Hunt-Minne- 
sota test, it is claimed that immediate and 
long-term memory are being measured. Ex- 
amination of the items, however, reveals 
failure to control for differential learning abil- 
ity. 


These criticisms are made on 
methodological grounds; there seems 
no reason why they should not be 
overcome. Assuming that they were 
overcome, would it be possible to de- 
velop adequate tests of brain dam- 
age? The answer would appear to be 
in the negative, as long as a purely 
engineering approach is made. Thus 
a most painstaking study by Lynn, 
Levine, and Hewson (42) was con- 
cerned with the aftereffects of ex- 
posure to blast during the war. Such 
exposure might result in a closed 
head injury with accompanying resid- 
ual symptomatology determined by 
the brain damage alone; or in a neu- 
rotic syndrome, identical clinically 
to that attributable to brain damage, 
but without any actual trauma to the 
brain; or (most commonly) in a com- 
bination of the two. Starting with 
over 4,000 patients, they reduced 
this group to 81 “pure” cases and 
attempted to differentiate these by 
means of a battery of tests. Elabor- 
ate precautions were taken to mini- 
mize unreliability of diagnosis and 
to control for age, sex, intelligence, 
etc., and a large number of independ- 
ent validating groups were used. 


Nevertheless, for these ‘“‘pure’’ groups 
the misclassification was still 12 per 
cent. For the validation groups, it is 
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arguable that the statistical tech- 
niques were inadequate, but the re- 
sults are still disappointing. 

The basic flaw in most of the tests 
discussed above lies, however, in the 
theoretical approach. Examination 
of the tests suggests that one major 
hypothesis concerning the psycho- 
logical effects of brain damage per- 
meates the work; this is, that brain 
damage results in deterioration of a 
relatively permanent nature. Such a 
theory is unlikely to provide a satis- 
factory basis for the construction of 
tests of brain damage because it is 
This 
criticism seems more important than 
the usual that 
an ill-defined concept—so much so 
that Hunt (35) preferred the neutral 
term ‘‘psychological deficit.””. Simi- 
larly, provided the test is rigorously 


not exclusive to brain damage. 


one deterioration is 


scored, the assumption that vocabu- 
lary level is resistant to the effects of 
mental illness has been shown to be 
false by the work of Yacorzynski (69), 
Capps (12), and Simmins (66); while 
Crown (14), using 4 cases of myx- 


edema (which causes apparent intel- 


lectual deterioration) tested before 
and alter treatment, found that the 
Shipley vocabulary MA of these pa- 
10.2 
months following treatment by thy- 
roxine, and on another vocabulary 
test, the mean vocabulary IQ rose by 
6 and 14 points in two of the cases. 


tients rose by an average of 


lhe first essential in the construc- 
tion of tests of brain damage, there- 
fore, is the development of a theory 
which is exclusive to brain damage. 
One possible theory has been pre- 
sented by Shapiro (61, 62, 63) in 
his articles, and may be stated as 
follows: Brain damage results in the 
development of states of exaggerated 
inhibition. Such a theory is more sat- 
isfactory than a theory of deterio- 
ration (in so far as it is supported 
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by experimental evidence) because 
it would not be postulated to account 
for functional disorder or for the be- 
havior of normals. Hence, it becomes 
possible to make deductions and set 
up experimental situations in which 
brain-damaged patients behave dif- 
ferently from others, and through 
which verification of the theory and 
development of tests of brain damage 
are feasible. Having done this and 
verified the theory in its broad out- 
lines, it now possible to 
consider alternative hypotheses and 
also to consider why some brain- 
damaged patients do not behave as 
predicted. One immediate possibil- 
ity, of course, is location of the dam- 
age. Halstead (28) and Rylander 
(58, 59) concur in demonstrating that 
it appears to be much easier to dis- 
tinguish brain damage in the frontal 
lobes from that in other areas of the 
brain. Another possibility is the sig- 
nificance of previous personality. 
The interaction of personality struc- 
ture and the effects of brain damage 
have been largely ignored by most 
investigators, although it is known 
that extensive brain damage may 
have little or no effect on functioning, 
and that a brain-damaged patient 
may be neurotic or psychotic. Now 
Eysenck (18) has shown that there 
are tests which discriminate between 


becomes 


psychotics and normals but not be- 
tween neurotics and normals, and 
vice versa; he has argued, on this 
basis, that neuroticism and psychot- 
icism are orthogonal (independent) 
factors. At this stage, therefore, it 
would be possible to construct a bat- 
tery of tests which would give a par- 
ticular patient a factor score on each 
of these three dimensions—psychot- 
icism, neuroticism, and brain dam- 
The question to be answered 
would then become, not, is the pa- 
tient psychotic or neurotic or brain 


age. 
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damaged, but what is his perform- 
ance with respect to these three vari- 
ables, and what is their interrelation- 
ship. At this point, the internal valid- 
itv of the tests would become im- 
portant since it would be necessary 
to show that those tests measuring 


the eifects of brain damage have sig- 
nificantly positive intercorrelations 
mong themselves, but not with the 
tests of psychoticism or neuroticism. 
It is clear, therefore, that the prob- 
lem is much more complex than was 
suggested in the early part of this 
paper. In the view of the writer, a 
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purely empirical approach is unlikely 
to yield satisfactory results, nor is an 
approach based on a theory which 
has not been adequately tested ex- 
perimentally. A satisfactory test of 
brain damage should be based on a 
reasonable theory that has been ex- 
perimentally tested, has been sup- 
ported by adequate statistical treat- 
ment, and has taken into account all 
relevant variables. Such an ap- 
proach would at least help to over- 
come the impasse which seems to 
have been reached with many of the 
tests reviewed above. 
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Many social scientists other than 
psychoiogists try to account for the 
behavior of individuals. Economists 
and a few psychologists have pro- 
duced a large body of theory and a 
few experiments that deal with indi- 
vidual decision making. The kind of 
decision making with which this body 
of theory deals is as follows: given 
two states, A and B, into either one of 
which an individual may put himself, 
the individual chooses A in prefer- 
ence to B (or vice versa). For in- 
stance, a child standing in front of a 
candy counter may be considering 
two states. In state A the child has 
$0.25 and no candy. In state B the 


child has $0.15 and a ten-cent candy 
bar. The economic theory of decision 
making is a theory about how to pre- 


dict such decisions. 

Economic theorists have been con- 
cerned with this problem since the 
Jeremy Bentham (1748 
1832). In recent years the develop- 
ment of the economic theory of con- 


davs of 


sumer’s decision making (or, as the 


' This work was supported by Contract 
NSori-166, Task Order I, between the Office 
of Naval Research and The Johns Hopkins 
University. This is Report No. 166-1-182, 
Project Designation No. NR 145-089, under 
that contract. I am grateful to the Depart- 
ment of Political Economy, The Johns Hop- 
kins University, for providing me with an 
office adjacent to the Economics Library 
while I was writing this paper. M. Allais, 
M. M. Flood, N. Georgescu-Roegen, K. O. 
May, A. Papandreou, L. J. Savage, and es- 
pecially C. H. Coombs have kindly made 
much unpublished material available to me 
\ number of psychologists, economists, and 
mathematicians have given me excellent, but 
sometimes unheeded, criticism. Especially 
helpful were C. Christ, C. H. Coombs, F. 
Mosteller, and L. J. Savage. 
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economists call it, the theory of con- 
sumer’s choice) has become exceed- 
ingly elaborate, mathematical, and 
voluminous. This literature is almost 
unknown to psychologists, in spite of 
sporadic pleas in both psychological 
(40, 84, 103, 104) and economic 
(101, 102, 123, 128, 199, 202) litera- 
ture for greater communication be- 
tween the disciplines. 

The purpose of this paper is to re- 
view this theoretical literature, and 
also the rapidly increasing number of 
psychological experiments (performed 
by both psychologists and econo- 
mists) that are relevant to it. The 
review will be divided into five sec- 
tions: the theory of riskless choices, 
the application of the theory of risk- 
less choices to welfare economics, the 
theory of risky choices, transitivity in 
decision making, and the theory of 
and of statistical decision 
functions. Since this literature is un- 
familiar and relatively inaccessible to 
most psychologists, and since I could 
not find any thorough bibliography 
on the theory of choice in the eco- 
nomic literature, this paper includes 
a rather extensive bibliography of the 
literature since 1930. 


games 


THEORY OF RISKLESS CHOICES? 


Economic man. The method of 
those theorists who have been con- 


Tu! 


2 No complete review of this literature is 
available. Kauder (105, 106) has reviewed the 
very early history of utility theory. Stigler 
(180) and Viner (194) have reviewed the 
literature up to approximately 1930. Samuel- 
book (164) contains an illuminating 
mathematical exposition of some of the con- 
tent of this theory. Allen (6) explains the con- 
cept of indifference curves. Schultz (172) re- 


, 
son s 
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cerned with the theory of decision 
making is essentially an armchair 
method. They make assumptions, 
and from these assumptions they de- 
duce theorems which presumably can 
be tested, though it sometimes seems 
unlikely that the testing will ever 
occur. The most important set of 
assumptions made in the theory of 
riskless choices may be summarized 
by saying that it is assumed that the 
person who makes any decision to 
which the theory is applied is an 
economic man. 

What is an economic man like? He 
has three properties. (a) He is com- 
pletely informed. (6) He is infinitely 
sensitive. (c) He is rational. 

Complete information. Economic 
man is assumed to know not only 
what all the courses of action open to 
him are, but also what the outcome of 
any action will be. Later on, in the 
sections on the theory of risky choices 
and on the theory of games, this as- 
sumption will be relaxed somewhat. 
(For the results of attempts to in- 
troduce the possibility of learning 
into this picture, see 51, 77.) 

Infinite sensitivity. In most of the 
older work on choice, it 1s assumed 
that the alternatives available to an 
individual are continuous, infinitely 
divisible functions, that prices are 
infinitely divisible, and that economic 
man is infinitely sensitive. The only 
purpose of these assumptions is to 
make the functions that they lead to, 
views the developments up to but not includ- 
ing the Hicks-Allen revolution from the point 
of view of demand theory. Hicks’s book (87) 
is a complete and detailed exposition of most 
of the mathematical and economic content of 
the theory up to 1939. Samuelson (167) has 
reviewed the integrability problem and the re 
vealed preference approach. And Wold (204, 
205, 206) has summed up the mathematical 
content of the whole field for anyone who is 
comfortably at home with axiom systems and 
differential equations. 
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continuous and differentiable. Stone 
(182) has recently shown that they 
can be abandoned with no serious 
changes in the theory of choice. 

Rationality. ‘The crucial fact about 
economic man is that he is rational. 
This means two things: He can 
weakly order the states into which he 
can get, and he makes his choices so 
as to maximize something. 

Two things are required in order 
for economic man to be able to put all 
available states into a weak ordering. 
First, given any two states into which 
he can get, A and B, he must always 
be able to tell either that he prefers 
A to B, or that he prefers B to A, or 
that he is indifferent between them. 
If preference is operationally defined 
as choice, then it seems unthinkable 
that this requirement can ever be 
empirically violated. The second 
requirement for weak ordering, a 
more severe one, is that all prefer- 
ences must be transitive. If economic 
man prefers A to B and B to C, then 
he prefers A to C. Similarly, if he is 
indifferent between A and B and 
between B and C, then he is in- 
different between A and C. It is not 
obvious that transitivity will always 


hold for human choices, and experi- 
ments designed to find out whether 
or not it does will be described in the 
section on testing transitivity. 


The second requirement of ra- 
tionality, and in some ways the more 
important one, is that economic man 
must make his choices in such a way 
as to maximize something. This is 
the central principle of the theory of 
choice. In the theory of riskless 
choices, economic man has usually 
been assumed to maximize utility. In 
the theory of risky choices, he is as- 
sumed to maximize expected utility. 
In the literature on statistical de- 
cision making and the theory of 


games, various other fundamental 
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principles of decision making are 
considered, but they are all maximi- 
zation principles of one sort or an- 
other. 

The fundamental content of the 
notion of maximization is that eco- 
nomic man always chooses the best 
alternative from among those open 
to him, as he sees it. In more techni- 
cal language, the fact that economic 
man prefers A to B implies and is 
implied by the fact that A is higher 
than B in the weakly ordered set 
mentioned above. (Some theories in- 
troduce probabilities into the above 
statement, so that if A is higher than 
B in the weak ordering, then eco- 
nomic man is more likely to choose A 
than B, but not certain to choose A.) 

This notion of maximization is 
mathematically useful, since it makes 
it possible for a theory to specify a 
unique point or a unique subset of 
points among those available to the 
decider. It seems to me psychologi- 
cally unobjectionable. So many differ- 
ent kinds of functions can be maxi- 
mized that almost any point actually 
available in an experimental situation 
can be maximum of 
some sort. Assumptions about maxi- 
mization only become specific, and 
therefore possibly wrong, when they 
specify what is being maximized. 

There has, incidentally, been al- 
most no discussion of the possibility 


regarded as a 


that the two parts of the concept of 
rationality might conflict. It is con- 
ceivable, for example, that it might 
in effort (and therefore in 
negative utility) to maintain a weakly 
ordered preference field. Under such 
circumstances, would it be ‘“‘rational’’ 
to have such a field? 

It is easy for a psychologist to point 
out that an economic man who has 
the properties discussed above is very 
unlike a real man. In fact, it is so 
easy to point this out that psycholo- 
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gists have tended to reject out of 
hand the theories that result from 
these assumptions. ‘This isn’t fair. 
Surely the assumptions contained in 
Hullian behavior theory (91) or in 
the Estes (60) or Bush-Mosteller 
(36, 37) learning theories are no more 
realistic than these. The most useful 
thing to do with a theory is not to 
criticize its assumptions but rather 
to test its theorems. If the theorems 
fit the data, then the theory has at 
least heuristic merit. Of course, one 
trivial from the 
assumptions embodied in the concept 
of economic man is that in any 


theorem deducible 


specific case of choice these assump- 


tions will be satisfied. For instance, 
if economic man is a model. for real 
men, then real men should always 
exhibit transitivity of real choices. 
Transitivity is an assumption, but it 
is directly testable. So are the other 
properties of economic man as a 
model for real men. 

Economists themselves are some- 
what distrustful of economic man 
(119, 156), and we will see in subse- 
quent sections the results of a num- 
ber of attempts to relax these as- 
sumptions. 

Early utility maximization theory. 
The school of philosopher-economists 
started by Jeremy Bentham and 
popularized by James Mill and others 
held that the goal of human action is 
to seek pleasure and avoid pain. 
Every object or action may be con- 
sidered from the point of view of 
pleasure- or pain-giving properties. 
These properties are called the utility 
of the object, and pleasure is given 
by positive utility and pain by nega- 
tive utility. The goal of action, then, 
is to seek the maximum utility. This 
simple hedonism of the future is 
easily translated into a theory of 
choice. People choose the alternative, 
from among those open to them, that 
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leads to the greatest excess of positive 
over negative utility. This notion of 
utility maximization is the essence of 
the utility theory of choice. It will 
reappear in various forms through- 
out this paper. (Bohnert [30] dis- 
cusses the logical structure of the 
utility concept.) 

This theory of choice was embodied 
in the formal economic analyses of all 
the early great names in economics. 
In the hands of Jevons, Walras, and 
Menger it reached increasingly so- 
phisticated mathematical expression 
and it was embodied in the thinking 
of Marshall, who published the first 
edition of his great Principles of 
Economics in 1890, and revised it at 
intervals for more than 30 years 
thereafter (137). 

The use to which utility theory was 
put by these theorists was to estab- 
lish the nature of the demand for 
various goods. On the assumption 
that the utility of any good is a 
monotonically increasing negatively 
accelerated function of the amount of 
that good, it is easy to show that the 
amounts of most goods which a con- 
sumer will buy are decreasing func- 
tions of price, functions which are 


precisely specified once the shapes of 
the utility curves are known. This is 
the result the economists needed and 


is, of course, a testable theorem. (For 
more on this, see 87, 159.) 
Complexities arise in this theory 
when the relations between the 
utilities of different goods are con- 
sidered Jevons, Walras, Menger, 
and even Marshall had assumed that 
the utilities of different commodities 
can be combined into a total utility 
by simple addition; this amounts to 
assuming that the utilities of different 
goods are independent (in spite of 
the fact that Marshall elsewhere dis- 
cussed the notions of competing 
goods, like soap and detergents, and 
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completing goods, like right and left 
shoes, which obviously do not have 
independent utilities). Edgeworth 
(53), who was concerned with such 
nonindependent utilities, pointed out 
that total utility was not necessarily 
an additive function of the utilities 
attributable to separate commodities. 
In the process he introduced the no- 
tion of indifference curves, and thus 
began the gradual destruction of the 
classical utility theory. We shall re- 
turn to this point shortly. 

Although the forces of parsimony 
have gradually resulted in the elimi- 
nation of the classical concept of 
utility from the economic theory of 
riskless choices, there have been a 
few attempts to use essentially the 
classical theory in an empirical way. 
Fisher (63) and Frisch (75) have de- 
veloped methods of measuring margi- 
nal utility (the change in utility [u] 
with an _ infinitesimal change in 
amount possessed [Q], ie., du/dQ) 
from market data, by making assump- 
tions about the interpersonal simi- 
larity of consumer tastes. Recently 
Morgan (141) has used several vari- 
ants of these techniques, has dis- 
cussed mathematical and logical flaws 
in them, and has concluded on the 
basis of his empirical results that the 
techniques unrealistic 
assumptions to be workable. The 
crux of the problem is that, for these 


require too 


techniques to be useful, the com- 
modities used must be independent 
(rather than competing or complet- 
ing), and the broad commodity clas- 
sifications necessary for adequate 
market data are not independent. 
Samuelson (164) has shown that the 
assumption of independent utilities, 
while it does guarantee interval scale 
utility measures, puts unwarrantably 
severe restrictions on the nature of 
the resulting demand function. Else- 
where Samuelson (158) presented, 
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primarily as a logical and mathe- 
matical exercise, a method of measur- 
ing marginal utility by assuming 
time function. Since 
no reasonable grounds can be found 
for assuming one such function rather 
than another, this procedure holds no 
promise of empirical success. Mar- 
(in his notion of ‘‘con- 
sumer’s surplus’’) a method of utility 
measurement that out to be 
dependent on the assumption of con- 
stant marginal utility of money, and 
which is therefore quite unworkable. 
Marshall's prestige led to extensive 
and debunking of this 
notion (e.g., 28), but little positive 
comes out of this literature. Thur- 
(186) is currently attempting 
to determine utility functions for 
commodities experimentally, but has 
reported no results as yet. 
Indifference curves. Edgeworth’s 
introduction of the notion of in- 
difference curves to deal with the 
utilities of nonindependent goods was 
mentioned An indifference 
curve is, in Edgeworth’s formula- 
tion, a constant-utility curve. Sup- 
that 
bananas, 
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the same amount of utility from 
10-apples-and-1-banana as you do 
from 6-apples-and-4-bananas. Then 
these are two points on an indiffer- 
ence curve, and of course there are 
an infinite number of other points on 
the same curve. Naturally, this is not 
the only indifference curve you may 
have between apples and bananas. It 
may also be true that you are in- 
different between 13-apples-and-5- 
bananas and 5-apples-and-15-banan- 
as. These are two points on another, 
higher indifference curve. A whole 
family of such curves is called an in- 
difference map. Figure 1 presents 
such a map. One particularly useful 
kind of indifference map has amounts 
of a commodity on one axis and 
amounts of money on the other. 
Money is a commodity, too. 

The notion of an indifference map 
can be derived, as Edgeworth derived 
it, from the notion of measurable 
utility. But it does not have to be. 
Pareto (146, see also 151) was seri- 
ously concerned about the assump- 
tion that utility was measurable up 
to a linear transformation. He felt 
that people could tell whether they 
preferred to be in state A or state B, 
but could not tell how much they 
preferred one state over the other. In 
other words, he hypothesized a utility 
function measurable only on an ordi- 
nal scale. Let us follow the usual 
economic language, and call utility 
measured on an ordinal scale ordinal 
utility, and utility measured on an 
interval scale, cardinal utility. It is 
meaningless to speak of the slope, or 
marginal utility, of an ordinal utility 
function; such a function cannot be 
differentiated. However, Pareto saw 
that the same conclusions which had 
been drawn from marginal utilities 
could be drawn from indifference 
curves. An indifference map can be 
drawn simply by finding all the com- 
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binations of the goods involved 
among which the person is indiffer- 
ent. Pareto’s formulation assumes 
that higher indifference curves have 
greater utility, but does not need to 
specify how much greater that utility 
is. 

It turns out to be possible to de- 
duce from indifference curves all of 
the theorems that were originally de- 
duced from cardinal utility measures. 
This banishing of cardinal utility was 
furthered considerably by splendid 
mathematical papers by Johnson 
(97) and Slutsky (177). (In modern 
economic theory, it is customary to 
think of an n-dimensional commodity 
space, and of indifference hyper- 
planes in that space, each such hyper- 
plane having, of course, n—1 dimen- 
sions. In order to avoid unsatisfactory 
preference structures, it is necessary 
to assume that consumers always 
have a complete weak ordering for all 
commodity bundles, or points in com- 
modity space. Georgescu-Roegen 
[76], Wold (204, 205, 206, 208], 
Houthakker [90], and Samuelson 
[167] have discussed this problem.) 

Pareto was not entirely consistent 
in his discussion of ordinal utility. 
Although he abandoned the assump- 
tion that its exact value could be 
known, he continued to talk about 
the sign of the marginal utility co- 
efficient, which assumed that some 
knowledge about the utility function 
other than purely ordinal knowledge 
was available. He also committed 
other inconsistencies. So Hicks and 
Allen (88), in 1934, were led to their 
classic paper in which they attempted 
to purge the theory of choice of its 
last introspective elements. They 
adopted the conventional economic 
view about indifference curves as de- 
termined from a sort of imaginary 
questionnaire, and proceeded to de- 
rive all of the usual conclusions about 


385 


consumer demand with no reference 
to the notion of even ordinal utility 
(though of course the notion of an 
ordinal scale of preferences was still 
embodied in their derivation of in- 
difference curves). This paper was 
for economics something like the be- 
haviorist revolution in psychology. 
Lange (116), stimulated by Hicks 
and Allen, pointed out another incon- 
sistency in Pareto. Pareto had as- 
sumed that if a person considered 
four states, A, B, C, and D, he could 
judge whether the difference between 
the utilities of A and B was greater 
than, equal to, or less than the differ- 
ence between the utilities of Cand D. 
Lange pointed out that if such a 
comparison was possible for any A, 
B, C, and D, then utility was car- 
dinally measurable. Since it seems 
introspectively that 
comparisons can be made, this paper 
provoked a flood of protest and com- 
ment (7, 22, 117, 147, 209). Never- 
theless, in spite of all the comment, 
and even in spite of skepticism by a 
distinguished economist as 
1952 (153), Lange is surely right. 
Psychologists should know this at 


obvious such 


late as 


once; such comparisons are the basis 


of the psychophysical Method of 
Equal Sense Distances, from which 
an interval scale is derived. (Samuel- 
son [162] has pointed out a very in- 
teresting qualification. Not only 
must such judgments of difference be 
possible, but they must also be transi- 
tive in order to define an interval 
scale.) But since such judgments of 
differences did not seem to be neces- 
sary for the development of consumer 
demand theory, Lange’s paper did 
not force the reinstatement of cardi- 
nal utility. 

Indeed, the pendulum swung 
further in the behavioristic direction. 
Samuelson developed a new analytic 
foundation for the theory of con- 
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sumer behavior, the essence of which 
is that indifference curves and hence 
the entire structure of the theory of 
consumer choice can be derived 
simply from observation of choices 
among alternative groups of pur- 
chases available to a consumer (160, 
161). This approach has been ex- 
tensively developed by Samuelson 
(164, 165, 167, 169) and others (50, 
90, 125, 126). The essence of the idea 
is that each choice defines a point 
and a slope in commodity space. 
Mathematical approximation meth- 
ods make it possible to combine a 
whole family of such slopes into an 
indifference hyperplane. A family of 
such hyperplanes forms an indiffer- 
ence “‘map.” 

In a distinguished but inaccessible 
series of articles, Wold (204, 205, 206; 
see also 208 for a summary presenta- 
tion) has presented the mathematical 
content of the Pareto, Hicks and Al- 
len, and revealed preference (Samu- 
elson) approaches, as well as Cassel’s 
demand function approach, and has 
shown that if the assumption about 
complete weak ordering of bundles of 
commodities which 


was discussed 


above is made, then all these ap- 


proaches are mathematically equiva- 
lent. 

Nostalgia for cardinal utility. The 
crucial reason for abandoning cardi- 
nal utility was the argument of the 
ordinalists that indifference curve 
analysis in its various forms could do 
everything that cardinal utility could 
do, with fewer assumptions. So far 
as the theory of riskless choice is con- 
cerned, this is so. But this is only an 
argument for parsimony, and parsi- 
mony is not always welcome. There 
was a series of people who, for one 
reason or another, wanted to rein- 
state cardinal utility, or at least 
marginal utility. There were several 
mathematically invalid attempts to 
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show that marginal utility could be 
defined even in an ordinal-utility 
universe (23, 24, 163; 25, 114). 
Knight (110), in 1944, argued ex- 
tensively for cardinal utility; he 
based his arguments in part on in- 
trospective considerations and in part 
on an examination of psychophysical 
scaling procedures. He stimulated a 
number of replies (29, 42; 111). Re- 
cently Robertson (154) pleaded for 
the reinstatement of cardinal utility 
in the interests of welfare economics 
(this point will be discussed again 
below). But in general the indiffer- 
ence curve approach, in its various 
forms, has firmly established itself as 
the structure of the theory of riskless 
choice. 

Experiments on indifference curves. 
Attempts to measure marginal utility 
from market data were discussed 
above. There have been three experi- 
mental attempts to measure indifler- 
ence curves. Schultz, who pioneered 
in deriving statistical demand curves, 
interested his colleague at the Univer- 
sity of Chicago, the psychologist 
Thurstone, in the problem of in- 
difference curves. ‘Thurstone (185) 
performed a very simple experiment. 
He gave one subject a series of com- 
binations of hats and overcoats, and 
required the subject to judge whether 
he preferred each combination to a 
standard. For instance, the subject 
judged whether he preferred eight 
hats and eight overcoats to fifteen 
hats and three overcoats. The same 
procedure was repeated for hats and 
shoes, and for shoes and overcoats. 
The data were fitted with indifference 
curves derived from the assumptions 
that utility curves fitted Fechner’s 
Law and that the utilities of the 
various objects were independent. 
Thurstone says that Fechner’s Law 
fitted the data better than the other 
possible functions he considered, but 
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presents no evidence for this asser- 
tion. The crux of the experiment 
was the attempt to predict the in- 
difference curves between shoes and 
overcoats from the other indifference 
curves. This was done by using the 
other two indifference curves to infer 
utility functions for shoes and for 
overcoats separately, and then using 
these two utility functions to predict 
the total utility of various amounts 
of shoes and overcoats jointly. The 
prediction worked rather well. The 
judgments of the one subject used are 
extraordinarily orderly; there is very 
little of the inconsistency and vari- 
ability that others working in this 
area have found. Thurstone says, 
“The subject ... was entirely naive 
as regards the psychophysical prob- 
lem involved and had no knowledge 
whatever of the nature of the curves 
that we expected to find’’ (185, p. 
154). He adds, “I selected as subject 
a research assistant in my laboratory 


who knew nothing about psycho- 


physics. Her work was largely 
clerical in nature. She had a very 
even disposition, and I instructed her 
to take an even motivational attitude 
on the successive occasions ... 1 was 
surprised at the consistency of the 
judgments that I obtained, but I am 
pretty sure that they were the result 
of careful instruction to assume a uni- 
form motivational attitude.’* From 
the economist’s point of view, the 
main criticism of this experiment is 
that it involved imaginary rather 
than real transactions (200). 

The second experimental measure- 
ment of indifference curves is reported 
by the economists Rousseas and Hart 
(197). They required large numbers 
of students to rank sets of three com- 
binations of different amounts of ba- 


* Thurstone, L. L. Personal communication, 
December 7, 1953. 
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con and eggs. By assuming that all 
students had the same indifference 
curves, they were able to derive a com- 
posite indifference map for bacon and 
eggs. No mathematical assumptions 
were necessary, and the indifference 
map is not given mathematical form. 
Some judgments were partly or com- 
pletely inconsistent with the final map, 
but not too many. The only conclu- 
sion which this experiment justifies is 
that it is possible to derive such a 
composite indifference map. 

The final attempt to measure an 
indifference curve is a very recent one 
by the psychologists Coombs and 
Milholland (49). The indifference 
curve involved is one between risk 
and value of an object, and so will be 
discussed below in the section on the 
theory of risky decisions. It is men- 
tioned here because the same meth- 
ods (which show only that the in- 
difference curve is convex to the 
origin, and so perhaps should not be 
called measurement) could equally 
well be applied to the determination 
of indifference curves in_ riskless 
situations. 

Mention should be made of the 
extensive economic work on statisti- 
cal demand curves. For some reason 
the most distinguished statistical de- 
mand curve derivers feel it necessary 
to give an account of consumer's 
choice theory as a preliminary to the 
derivation of their empirical demand 
curves. The result is that the two 
best books in the area (172, 182) are 
each divided into two parts; the first 
is a general discussion of the theory 
of consumer's choice and the second 
a quite unrelated report of statistical 
economic work. Stigler (179) has 
given good reasons why the statistical 
demand curves are so little related to 
the demand curves of economic 
theory, and Wallis and Friedman 
(200) argue plausibly that this state 
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of affairs is inevitable. At any rate, 
there seems to be little prospect of 
using large-scale economic data to fill 
in the empirical content of the theory 
of individual decision making. 
Psychological comments. There are 


several commonplace observations 


that are likely to occur to psycholo- 


gists as soon as they try to apply the 
theory of riskless choices to actual 
experimental work. The first is that 
human beings are neither perfectly 
nor perfectly sensitive. 
This means that indifference curves 
likely to be observable as in- 
difference regions, or as probability 
distributions of choice around a 
It would be easy to 
assume that each indifference curve 
represents the modal value of a nor- 
mal sensitivity curve, and that choices 
should have properties 
predictable from that hypothesis as 
the amounts of the commodities 
(locations in product space) are 
changed. This implies that the defi- 
nition of indifference between two 
collections of commodities should be 


consistent 


are 


central locus. 


statistical 


that each collection is preferred over 
the other 50 per cent of the time. 
Such a definition has been proposed 
by an economist (108), and used in 
experimental work by psychologists 
(142). Of course, 50 per cent choice 
has been a standard psychological 
definition of indifference the 
days of Fechner. 

Incidentally, failure on the part of 
an economist to understand that a 
just noticeable difference (j.n.d.) isa 
has led him to 
argue that the indifference relation is 
intransitive, that is, that if A is in- 
different to B and B is indifferent to 
C, then A need not be indifferent to C 
(8,9, 10). He argues that if A and B 
are less than one j.n.d. apart, then A 
will be indifferent to B; the same of 
course is true of B and C; but A and 


since 


statistical concept 
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C may be more than one j.n.d. apart, 
and so one may be preferred to the 
other. This argument is, of course, 
wrong. If A has slightly more utility 
than B, then the individual will 
choose A in preference to B slightly 
more than 50 per cent of the time, 
even though A and B are less than 
one j.n.d. apart in utility. The 50 per 
cent point is in theory a precisely 
defined point, not a region. It may in 
fact be difficult to determine because 
of inconsistencies in judgments and 
because of changes in taste with time. 

The second psychological observa- 
tion is that it seems impossible even 
to dream of getting experimentally 
an indifference map in n-dimensional 
space where m is greater than 3. Even 
the case of n=3 presents formidable 
experimental problems. ‘This is less 
important to the psychologist who 
wants to use the theory of choice to 
rationalize experimental data than 
to the economist who wants to de- 
rive a theory of general static equilib- 
rium. 

Experiments like Thurstone’s (185) 
involve so many assumptions that it 
is difficult to know what their empiri- 
cal meaning might be if these assump- 
tions were not made. Presumably, 
the best thing to do with such ex- 
periments is to consider them as tests 
of the assumption with the least face 
validity. ‘Thurstone was willing to 
assume utility maximization and in- 
dependence of the commodities in- 
volved (incidentally, his choice of 
commodities singularly un- 
fortunate for justifying an assump- 
tion of independent utilities), and so 
used his data to construct a utility 
function. Of course, if only ordinal 
utility is assumed, then experimental 
indifference curves cannot be used 
this way. In fact, in an ordinal- 
utility universe neither of the prin- 
cipal assumptions made by Thurstone 


seems 
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can be tested by means of experi- 
mental indifference curves. So the 
assumption of cardinal utility, though 
not necessary, seems to lead to con- 
siderably more specific uses for ex- 
perimental data. 

\t any rate, from the experimental 
point of view the most interesting 
question is: What is the observed 
shape of indifference curves between 
independent commodities? This ques- 
tion awaits an experimental answer. 

The notion of utility is very similar 
to the Lewinian notion of valence 
(120, 121). Lewin conceives of 
valence as the attractiveness of an 
object or activity to a person (121). 
Thus, psychologists might consider 
the experimental study of utilities to 
be the experimental study of valences, 
and therefore an attempt at quantify- 
ing parts of the Lewinian theoretical 
schema. 


APPLICATION OF THE THEORY OF 
RIsKLEss Cuorces TO WEL- 
FARE Economics* 

The classical utility theorists as- 
sumed the existence of interpersonally 
comparable cardinal utility. They 
were thus able to find a simple an- 
swer to the question of how to de- 


termine the best economic policy: 
That economic policy is best which 
results in the maximum total utility, 


summed over all members of the 
economy. 

The abandonment of interpersonal 
comparability makes this answer use- 
less. A sum is meaningless if the 
units being summed are of varying 
sizes and there is no way of reducing 
them to some common size. ‘This 

* The discussion of welfare economics given 
in this paper is exceedingly sketchy. For a 
picture of what the complexities of modern 
welfare economics are really like (see 11, 13, 
14, 86, 118, 124, 127, 139, 140, 148, 154, 155, 
166, 174). 
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point has not been universally recog- 
nized, and certain economists (e.g., 
82, 154) still defend cardinal (but not 
interpersonally comparable) utility 
on grounds of its necessity for wel- 
fare economics. 

Pareto's principle. The abandon- 
ment of interpersonal comparability 
and then of cardinal utility produced 
a search for some other principle to 
justify economic policy. Pareto 
(146), who first abandoned cardinal 
utility, provided a partial solution. 
He suggested that a change should 
be considered desirable if it left 
everyone at least as well off as he 
was before, and made at least one 
person better off. 

Compensation principle. VPareto’s 
principle is fine as far as it goes, but 
it obviously does not go very far. 
The economic decisions which can be 
made on so simple a principle are few 
and insignificant. So welfare eco- 
nomics languished until Kaldor (98) 
proposed the compensation — prin- 
ciple. This principle is that if it is 
possible for those who gain from an 
economic change to compensate the 
losers for their losses and still have 
something left over from their gains, 
then the change is desirable. Of 
course, if the compensation is actually 
paid, then this is simply a case of 
Pareto’s principle. 

But Kaldor that the 
compensation need not actually be 
made; all that was necessary was 
that it could be made. The fact that 
it could be made, according to 
Kaldor, is evidence that the change 
produces an excess of good over harm, 
and so is desirable. Scitovsky (173) 
observed an inconsistency in Kaldor’s 
position: Some cases could arise in 
which, when a change from A to B 
has been made because of Kaldor’s 
criterion, then a change back from B 
to A would also satisfy Kaldor’s 


asserted 
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criterion. It is customary, therefore, 
to assume that changes which meet 
the original Kaldor criterion are only 
desirable if the reverse change does 
not also meet the Kaldor criterion. 

It has gradually become obvious 
that the Kaldor-Scitovsky criterion 
does not solve the problem of welfare 
economics (see e.g., 18, 99). It as- 
sumes that the unpaid compensation 
does as much good to the person who 
gains it as it would if it were paid to 
the people who lost by the change. 
For instance, suppose that an in- 
dustrialist can earn $10,000 a year 
more from his plant by using a new 
machine, but that the introduction of 
the machine throws two people ir- 
retrievably out of work. If the salary 
of each worker prior to the change 
was $4,000 a year, then the in- 
dustrialist could compensate the 
workers and still make a profit. But 
if he does not compensate the work- 
ers, then the added satisfaction he 
gets from his extra $10,000 may be 
much less than the misery he pro- 
duces in his two workers. This ex- 
ample only illustrates the principle; 
it does not make much sense in these 


days of progressive income taxes, un- 
employment compensation, high em- 
ployment, and strong unions. 

Social welfare functions. From here 
on the subject of welfare economics 
gets too complicated and too remote 
from psychology to merit extensive 


exploration in this paper. The line 
that it has taken is the assumption 
of a social welfare function (21), a 
function which combines individual 
utilities in a way which satisfies 
Pareto’s principle but is otherwise 
undefined. In spite of its lack of 
definition, it is possible to draw 
certain conclusions from such a func- 
tion (see e.g., 164). However, Arrow 
(14) has recently shown that a social 
welfare function that meets certain 
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very reasonable requirements about 
being sensitive in some way to the 
wishes of all the people affected, 
etc., cannot in general be found in 
the absence of interpersonally com- 
parable utilities (see also 89). 
Psychological comment. Some econ- 
omists are willing to accept the 
fact that they are inexorably com- 
mitted to making moral judgments 
when they recommend economic 
policies (e.g., 152, 153). Others still 
long for the impersonal amorality of a 
utility measure (e.g., 154). However 
desirable interpersonally comparable 
cardinal utility may be, it seems 
utopian to hope that any experi- 
mental procedure will ever give in- 
formation about individual utilities 
that could be of any practical use in 
guiding large-scale economic policy. 


Tue THeory or Risky Cuorces 
Risk and uncertainty. Economists 


and statisticians distinguish between 


§ Strotz (183) and Alchian (1) present non- 
technical and sparkling expositions of the von 
Neumann and Morgenstern utility measure- 
ment proposals. Georgescu-Roegen (78) criti- 
cally discusses various axiom systems so as to 
bring some of the assumptions underlying this 
kind of cardinal utility into clear focus, Allais 
(3) reviews some of these ideas in the course of 
criticizing them. Arrow (12, 14) reviews parts 
of the field. 

There is a large psychological literature on 
one kind of risky decision making, the kind 
which results when psychologists use partial 
reinforcement. This literature has been re- 
viewed by Jenkins and Stanley (96). Recently 
a number of experimenters, including Jarrett 
(95), Flood (69, 70), Bilodeau (27), and my- 
self (56) have been performing experiments on 
human subjects who are required to choose 
repetitively between two or more alternatives, 
each of which has a probability of reward 
greater than zero and less than one. The prob- 
lems raised by these experiments are too com- 
plicated and too far removed from conven- 
tional utility theory to be dealt with in this 
paper. This line of experimentation may even- 
tually provide the link which ties together 
utility theory and reinforcement theory. 
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risk and uncertainty. There does not 
seem to be any general agreement 
about which concept should be as- 
sociated with which word, but the 
following definitions make the most 
important distinctions. 

Almost everyone would agree that 
when I toss a coin the probability 
that I will get a head is .5. A proposi- 
tion about the future to which a num- 
ber can be attached, a number that 
represents the likelihood that the 
proposition is true, may be called a 
first-order risk. What the rules are for 
attaching such numbers is a much 
debated question, which will be 
avoided in this paper. 

Some propositions may depend on 
more than one probability distribu- 
tion. For instance, | may decide that 
if I get a tail, I will put the coin back 
in my pocket, whereas if | get a head, 
1 will toss it again. Now, the prob- 
ability of the proposition “I will get 
a head on my second toss” is a func- 
tion of two probability distributions, 
the distribution corresponding to the 
first toss and that corresponding to 
the second toss. This might be called 
a second-order risk. Similarly, risks of 
any order may be constructed. It isa 
mathematical characteristic of all 
higher-order risks that they may be 
compounded into first-order risks by 
means of the usual theorems for com- 
pounding probabilities. (Some econo- 
mists have argued against this pro- 
cedure [83], essentially on the grounds 
that you may have more information 
by the time the second risk comes 
around. Such problems can best be 
dealt with by means of von Neumann 
and Morgenstern’'s [197] concept of 
strategy, which is discussed below. 
They become in general problems of 
uncertainty, rather than risk.) 

Some propositions about the future 
exist to which no generally accepted 
probabilities can be attached. What 
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is the probability that the following 
proposition is true: Immediately 
after finishing this paper, you will 
drink a glass of beer? Surely it is 
neither impossible nor certain, so it 
ought to have a probability between 
zero and one, but it is impossible for 
you or me to find out what that prob- 
ability might be, or even to set up 
generally acceptable rules about how 
to find out. Such propositions are 
considered cases of uncertainty, rather 
than of risk. This section deals only 
with the subject of first-order risks. 
The subject of uncertainty will arise 
again in connection with the theory 
of games. 

Expected utility maximization. The 
traditional mathematical notion for 
dealing with games of chance (and so 
with risky decisions) is the notion 
that choices should be made so as to 
maximize expected value. ‘The ex- 
pected value of a bet is found by 
multiplying the value of each possible 
outcome by its probability of oc- 
currence and summing these prod- 
ucts across all possible outcomes. In 
symbols: 


EV=p,$:+p2$2.+ ee +p.$,, 


where p stands for probability, $ 
stands for the value of an outcome, 
and pitpot +++ +p,.=1. 

The assumption that people ac- 
tually behave the way this mathe- 
matical notion says they should is 
contradicted by observable behavior 


in many risky situations. People are 
willing to buy insurance, even though 
the person who sells the insurance 
makes a profit. People are willing to 
buy lottery tickets, even though the 
lottery makes a profit. Consideration 
of the problem of insurance and of the 
St. Petersburg paradox led Daniel 
Bernoulli, an eighteenth century 
mathematician, to propose that they 
could be resolved by assuming that 
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people act so as to maximize expected 
utility, rather than expected value 
(26). (He also assumed that utility 
followed a function that more than a 
century later was proposed by Fech- 
ner for subjective magnitudes in 
general and is now called Fechner’s 
Law.) This was the first use of the 
notion of expected utility. 

The literature on risky decision 
making prior to 1944 consists pri- 
marily of the St. Petersburg paradox 
and other gambling and probability 
literature in mathematics, some liter- 
ary discussion in economics (e.g., 109, 
187), one economic paper on lotteries 
(189), and the early literature of the 
theory of games (31, 32, 33, 34, 195), 
which did not use the notion of 
utility. The modern period in the 
study of risky decision making began 
with the publication in 1944 of von 
Neumann and Morgenstern's monu- 
mental book Theory of Games and 


Economic Behavior (196, see also 


197), which we will discuss more fully 


later. Von Neumann and Morgen- 
stern pointed out that the usual as- 
sumption that economic man can 
always say whether he prefers one 
state to another or is indifferent be- 
tween them needs only to be slightly 
modified in order to imply cardinal 
utility. The modification consists of 
adding that economic man can also 
completely order probability com- 
binations of states. Thus, suppose 
that an economic man is indifferent 
between the certainty of $7.00 and a 
50-50 chance of gaining $10.00 or 
nothing. We can assume that his 
indifference between these two pros- 
pects means that they have the same 
utility for him. We may define the 
utility of $0.00 as zero utiles (the 
usual name for the unit of utility, just 
as sone is the name for the unit of 
auditory loudness), and the utility 
of $10.00 as 10 utiles. These two 
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arbitrary definitions correspond to 
defining the two undefined constants 
which are permissible since cardinal 
utility is measured only up to a linear 
transformation. Then we may cal- 
culate the utility of $7.00 by using 
the concept of expected utility as fol- 
lows: 


U'($7.00) =.5U ($10.00) +.5 U ($0.00) 
= .5(10)+.5(0) =5. 


Thus we have determined the cardi- 
nal utility of $7.00 and found that it 
is 5 utiles. By varying the probabil- 
ities and by using the already found 
utilities it is possible to discover the 
utility of any other amount of money, 
using only the two permissible arbi- 
trary definitions. It is even more 
convenient if instead of +$10.00, 
— $10.00 or some other loss is used as 
one of the arbitrary utilities. 

A variety of implications is em- 
bodied in this apparently simple no- 
tion. In the attempt to examine and 
exhibit clearly what these implica- 
tions are, a number of axiom systems, 
differing from von Neumann and 
Morgenstern’s but leading to the 
same result, have been developed 
(73, 74, 85, 135, 136, 171). This 
paper will not attempt to go into 
the complex discussions (e.g., 130, 
131, 168, 207) of these various al- 
ternative axiom systems. One recent 
discussion of them (78) has con- 
cluded, on reasonable grounds, that 
the original von Neumann and Mor- 
genstern set of axioms is still the best. 

It is profitable, however, to ex- 
amine what the meaning of this no- 
tion is from the empirical point of 
view if it is right. First, it means that 
risky propositions can be ordered 
in desirability, just as riskless ones 
can. Second, it means that the con- 
cept of expected utility is behavior- 
ally meaningful. Finally, it means 
that choices among risky alternatives 
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are made in such a way that they 
maximize expected utility. 

If this model is to be used to pre- 
dict actual choices, what could go 
wrong with it? It might be that the 
probabilities by which the utilities 
are multiplied should not be the ob- 
jective probabilities; in other words, a 
decider’s estimate of the subjective 
importance of a probability may not 
be the same as the numerical value of 
that probability. It might be that 
the method of combination of proba- 
bilities and values should not be 
simple multiplication. It might be 
that the method of combination of 
the probability-value products should 
not be simple addition. It might be 
that the process of gambling has 
some positive or negative utility of 
its own. It might be that the whole 
approach is wrong, that people just 
do not behave as if they were trying 
to maximize expected utility. We 
shall examine some of these pos- 
sibilities in greater detail below. 

Economic implications of maximiz- 
ing expected utility. The utility- 
measurement notions of von Neu- 
mann and Morgenstern were en- 
thusiastically welcomed by many 
economists (e.g., 73, 193), though a 
few (e.g., 19) were at least tempo- 
rarily (20) unconvinced. The most 
interesting economic use of them was 
proposed by Friedman and Savage 
(73), who were concerned with the 
question of why the same person who 
buys insurance (with a negative ex- 
pected money value), and therefore is 
willing to pay in order not to take 
risks, will also buy lottery tickets 
(also with a negative expected money 
value) in which he pays in order to 
take risks. They suggested that these 
facts could be reconciled by a doubly 
inflected utility curve for money, like 
that in Fig. 2. If J represents the 
person’s current income, then he is 
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clearly willing to accept “‘fair’’ in- 
surance (i.e., insurance with zero ex- 
pected money value) because the 
serious loss against which he is insur- 
ing would have a lower expected 
utility than the certain loss of the 
insurance premium. (Negatively ac- 
celerated total utility curves, like 
that from the origin to J, are what 
you get when marginal utility de- 
creases; thus, decreasing marginal 
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Fic. 2. Hyporueticat Utitity CuRVE FOR 
Money, PROPOSED BY FRIEDMAN AND SAVAGE 


utility is consistent with the avoid- 
ance of risks.) The person would also 
be willing to buy lottery tickets, since 
the expected utility of the lottery 
ticket is greater than the certain loss 
of the cost of the ticket, because of 
the rapid increase in the height of the 
utility function. Other considera- 
tions make it necessary that the 
utility curve turn down again. Note 
that this discussion assumes that 
gambling has no inherent utility. 
Markowitz (132) suggested an im- 
portant modification in this hy- 
pothesis. He suggested that the 
origin of a person’s utility curve for 
money be taken as his customary 
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financial status, and that on both 
sides of the origin the curve be as- 
sumed first concave and then convex. 
If the person's customary ‘state of 
wealth changes, then the shape of his 
utility curve will thus remain gen- 
erally the same with respect to where 
he now is, and so his risk-taking be- 
havior will remain pretty much the 
same instead of changing with every 
change of wealth as in the Friedman- 
Savage formulation. 

Criticism of the 
maximization theory. 
to construct 
that violate 
Morgenstern 


ex pected-utility 
It is fairly easy 
examples of behavior 
the von Neumann- 
axioms (for a_partic- 
ularly ingenious example, see 183). It 
is especially easy to do so when the 
amounts of money involved are very 
large, or when the probabilities or 
probability differences involved are 
extremely small. Allais (5) has con- 
structed a questionnaire full of items 
For an economist in- 
terested in using these axioms as a 


of this type. 


basis for a completely general theory 
of risky choice, these examples may 


But psychological in- 
terest in this model is more modest. 
The psychologically important ques- 
tion is: Can such a model be used to 
account for simple experimental ex- 
amples of risky decisions? 

Of course a utility function derived 
by von Neumann- Morgenstern means 
is not necessarily the same as a classi- 
cal utility function (74, 203; see also 
82). 

Experiment on the von Neumann- 
Morgenstern model. A number of ex- 
periments on risky decision making 
have been performed. Only the first 
of them, by Mosteller and Nogee 
(142), has been in the simple frame- 
work of the model described above. 
All the rest have in some way or 
another centered on the concept of 
probabilities effective for behavior 


be significant. 


WARD EDWARDS 


which differ in some way from the 
objective probabilities, as well as on 
utilities different from the objective 
values of the objects involved. 

Mosteller and Nogee (142) carried 
out the first experiment to apply the 
von Neumann-Morgenstern model. 
They presented Harvard undergradu- 
ates and National Guardsmen with 
bets stated in terms of rolls at poker 
dice, which each subject could accept 
or refuse. Each bet gave a “hand” 
at poker dice. If the subject could 
beat the hand, he won an amount 
stated in the bet. If not, he lost a 
nickel. Subjects played with $1.00, 
which they were given at the be- 
ginning of each experimental session. 

hey were run together in groups of 
five; but each decided and rolled the 
poker dice for himself. Subjects were 
provided with a table in which the 
mathematically fair bets were shown, 
so that a subject could immediately 
tell by referring to the table whether 
a given bet was fair, or better or 
worse than fair. 

In the data analysis, the first step 
was the determination of ‘“‘indiffer- 
ence offers.’ For each probability 
used and for each player, the amount 
of money was found for which that 
player would accept the bet 50 per 
cent of the time. Thus equality was 
defined as 50 per cent choice, as it 
is likely to be in all psychological ex- 
periments of this sort. Then the 
utility of $0.00 was defined as 0 
utiles, and the utility of losing a 
nickel was defined as —1 utile. With 
these definitions and the probabilities 
invelved, it was easy to calculate the 
utility corresponding to the amount 
of money involved in the indifference 
offer. It turned out that, in general, 
the Harvard undergraduates had 
diminishing marginal utilities, while 
the National Guardsmen had _in- 
creasing marginal utilities. 
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The utilities thus calculated were 
used in predicting the results of more 
complex bets. It is hard to evaluate 
the success of these predictions. At 
any rate, an auxiliary paired- 
comparisons experiment showed that 
the hypothesis that subjects maxi- 
mized expected utility predicted 
choices better than the hypothesis 
that subjects maximized expected 
money value. 

The utility curve that Mosteller 
and Nogee derive is different from 
the one Friedman and Savage (73) 
were talking about. Suppose that a 
subject's utility curve were of the 
Friedman-Savage type, as in Fig. 2, 
and that he had enough money to put 
him at point P. If he now wins or 
loses a bet, then he is moved to a 
different location on the indifference 
curve, say Q. (Note that the amounts 
of money involved are much smaller 
than in the original Friedman-Savage 
use of this curve.) However, the con- 


struction of a Mosteller-Nogee utility 
curve assumes that the individual is 
always at the same point on his 
utility curve, namely the origin. This 
means that the curve is really of the 


Markowitz 
above, 


(132) type discussed 
instead of the Friedman- 
Savage type. The curve is not really 
a curve of utility of money in general, 
but rather it is a curve of the utility- 
for-m-more dollars. Even so, it must 
be assumed further that as the total 
amount of money possessed by the 
subject changes during the experi- 
ment, the utility-for-m-more dollars 
curve does not change. Mosteller and 
Nogee argue, on the basis of detailed 
examination of some of their data, 
that the amount of money possessed 
by the subjects did not seriously 
influence their choices. The utility 
curves they reported showed chang- 
ing marginal utility within the 
amounts of money used in their ex- 
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periment. Consequently, their con- 
clusion that the amount of money 
possessed by the subjects was not 
seriously important can only be true 
ife their utility curves are utility- 
for-n-more dollars curves and if the 
shapes of such curves are not affected 
by changes in the number of dollars 
on hand. This discussion exhibits a 
type of problem which must always 
arise in utility measurement and 
which is new in psychological scaling. 
The effects of previous judgments on 
present judgments are a_ familiar 
story in psychophysics, but they are 
usually assumed to be contaminating 
influences that can be minimized or 
eliminated by proper experimental 
design. In utility scaling, the funda- 
mental idea of a utility scale is such 
that the whole structure of asubject’s 
choices should be altered as a result 
of each previous choice (if the choices 
are real ones involving money gains 
or losses). ‘The Markowitz solution 
to this problem is the most practical 
one available at present, and that 
solution is not entirely satisfactory 
since all it does is to assume that 
people's utilities for money operate 
in such a way that the problem does 
not really exist. This assumption is 
plausible for money, but it gets 
rapidly less plausible when other 
commodities with a less continuous 
character are considered instead. 
Probability preferences. In a series 
of recent experiments (55, 57, 58, 
59), the writer hasshown that subjects, 
when they bet, prefer some probabil- 
ities to others (57), and that these 
preferences cannot be accounted for 
by utility considerations (59), All 
the experiments were basically of the 
same design. Subjects were required 
to choose between pairs of bets ac- 
cording to the method of paired com- 
parisons. The bets were of three 
kinds: positive expected value, nega- 
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tive expected value, and zero ex- 
pected value. The two members of 
each pair of bets had the same ex- 
pected value, so that there was never 
(in the main experiment [57, 59]) any 
objective reason to expect that choos- 
ing one bet would be more desirable 
than choosing the other. 

Subjects made their choices under 
three conditions: just imagining they 
were betting; betting for worthless 
chips; and betting for real money. 
They paid any losses from their own 
funds, but they were run in extra 
sessions after the main experiment to 
bring their winnings up to $1.00 per 
hour. 

The results showed that two fac- 
tors were most important in deter- 
mining choices: general preferences or 
dislikes for risk-taking, and specific 
preferences among probabilities. An 
example of the first kind of factor is 
that subjects strongly preferred low 
probabilities of losing large amounts 


of money to high probabilities of 
losing small amounts of money—they 
just didn't like to lose. It also turned 
out that on positive expected value 
bets, they were more willing to accept 


long shots when playing for real 
money than when just imagining or 
playing for worthless chips. An ex- 
ample of the second kind of factor 
is that they consistently preferred 
bets involving a 4/8 probability of 
winning to all others, and consistently 
avoided bets involving a 6/8 prob- 
ability of winning. These preferences 
were reversed for negative expected 
value bets. 

These results were independent of 
the amounts of money involved in 
the bets, so long as the condition of 
constant expected value was main- 
tained (59). When pairs of bets which 
differed from one another in expected 
value were used, the choices were a 
compromise between maximizing ex- 
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pected amount of money and betting 
at the preferred probabilities (58). 
An attempt was made to construct 
individual utility curves adequate to 
account for the results of several sub- 
jects. For this purpose, the utility of 
$0.30 was defined as 30 utiles, and it 
was assumed that subjects cannot 
discriminate utility differences small- 
er than half a utile. Under these as- 
sumptions, no individual utility curves 
consistent with the data could be 
drawn. Various minor experiments 
showed that these results were relia- 
ble and not due to various possible 
artifacts (59). No attempt was made 
to generate a mathematical model of 
probability preferences. 

The existence of probability prefer- 
ences means that the simple von 
Neumann-Morgenstern method of 
utility measurement cannot succeed. 
Choices between bets will be deter- 
mined not only by the amounts of 
money involved, but also by the 
preferences the subjects have among 
the probabilities involved. Only an 
experimental procedure which holds 
one of these variables constant, or 
otherwise allows for it, can hope to 
measure the other. Thus my experi- 
ments cannot be regarded as a way 
of measuring probability preferences; 
they show only that such preferences 
exist. 

It may nevertheless be possible to 
get an interval scale of the utility of 
money from gambling experiments by 
designing an experiment which meas- 
ures utility and probability prefer- 
ences simultaneously. Such experi- 
ments are likely to be complicated 
and difficult to run, but they can be 
designed. 

Subjective probability. First, a 
clarification of terms is necessary. 
The phrase subjective probability has 
been used in two ways: as a name 
for a school of thought about the 
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logical basis of mathematical prob- 
ability (51, 52, 80) and as a name for 
a transformation on the scale of 
mathematical probabilities which is 
somehow related to behavior. Only 
the latter usage is intended here. The 
clearest distinction between these 
two notions arises from considera- 
tion of what happens when an objec- 
tive probability can be defined (e.g., 
in a game of craps). If the subjective 
probability is assumed to be different 
from the objective probability, then 
the concept is being used in its sec- 
ond, or psychological, sense. Other 
terms with the same meaning have 
also been used: personal probability, 
psychological probability, expecta- 
tion (a poor term because of the 
danger of confusion with expected 
value). (For a more elaborate 
treatment of concepts in this area, 
see 192.) 

In 1948, prior to the Mosteller and 
Nogee experiment, Preston and 
Baratta (149) used essentially similar 
logic and a somewhat similar experi- 
ment to measure subjective prob- 
abilities instead of subjective values. 
They required subjects to bid com- 
petitively for the privilege of taking 
a bet. All bids were in play money, 
and the data consisted of the winning 
bids. If each winning bid can be con- 
sidered to represent a value of play 
money such that the winning bidder 
is indifferent between it and the bet 
he is bidding for, and if it is further 
assumed that utilities are identical 
with the money value of the play 
money and that all players have the 
same subjective probabilities, then 
these data can be used to construct a 
subjective probability scale. Preston 
and Baratta constructed such a 
scale. The subjects, according to the 
scale, overestimate low probabilities 
and underestimate high ones, with an 
indifference point (where subjective 


equals objective probability) at about 
0.2. Griffith (81) found somewhat 
similar results in an analysis of 
parimutuel betting at race tracks, as 
did Attneave (17) in a guessing game, 
and Sprowls (178) in an analysis of 
various lotteries. The Mosteller and 
Nogee data (142) can, of course, be 
analyzed for subjective probabilities 
instead of suixjective values. Mostel- 
ler and Nogee performed such an 
analysis and said that their results 
were in general agreement with 
Preston and Baratta’s. However, 
Mosteller and Nogee found no in- 
difference point for their Harvard 
students, whereas the National 
Guardsmen had an indifference point 
at about 0.5. They are not able to 
reconcile these differences in results. 

The notion of subjective probabil- 
ity has some serious logical difficulties. 
The scale of objective probability is 
bounded by 0 and 1. Should a sub- 
jective probability scale be similarly 
bounded, or not? If not, then many 
different subjective probabilities will 
correspond to the objective proba- 
bilities 0 and 1 (unless some trans- 
forma'tion is used so that 0 and 1 ob- 
jective probabilities correspond to 
infinite subjective probabilities, which 
seems unlikely). Considerations of 
the addition theorem to be discussed 
in a moment have occasionally led 
think of a_ subjective 


people to 
probability scale bounded at 0 but 


not at 1. This is surely arbitrary. 
The concept of absolute certainty is 
neither more nor less indeterminate 
than is the concept of absolute im- 
possibility. 

Even more drastic logical problems 
arise in connection with the addition 
theorem. If the objective probability 
of event A is P, and that of A not 
occurring is Q, then P+Q=1. Should 
this rule hold for subjective proba- 
bilities? Intuitively it seems neces- 
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sary that if we know the subjective 
probability of A, we ought to be able 
to figure out the subjective proba- 
bility of not-A, and the only reason- 
able rule for figuring it out is sub- 
traction of the subjective probability 
of A from that of complete certainty. 
But the acceptance of this addition 
theorem for subjective probabilities 
plus the idea of bounded subjective 
probabilities means that the subjec- 
tive probability scale must be identi- 
cal with the objective probability 
scale. Only for a subjective proba- 
bility scale identical with the objec- 
tive probability will the 
subjective probabilities of a collec- 
tion of events, one of which must 
happen, add up to 1. In the special 
case where only two events, A and 
not-A, are considered, a subjective 
probability scale like S1 or S2 in 
Fig. 3 would meet the requirements 
of additivity, and this fact has led to 
some speculation about such scales, 
But such 


scale 


particularly about $1. 
scales do not meet the additivity re- 


quirements when more than two 
events are considered. 


One way of avoiding these difh- 





IVE PROBABILITY 


ECT 


- 


SUBJ 


l 











0.5 
OBJECTIVE PROBABILITY 


Fic. 3. HyporuHericaL SUBJECTIVE PRop- 
ABILITY CURVES 


WARD EDWARDS 


culties is to stop thinking about a 
scale of subjective probabilities and, 
instead, to think of a weighting 
function applied to the scale of objec- 
tive probabilities which weights these 
objective probabilities according to 
their ability to contro] behavior. Pre- 
sumably, I was studying this ability 
in my experiments on probability 
preferences (55, 57, 58, 59). There is 
no reason why such weighted proba- 
bilities should add up to 1 or should 
obey any other simple combinatory 
principle. 

Views and experiments which com- 
bine utility and subjective probability. 
The philosopher Ramsey published 
in 1926 (reprinted in 150) an essay 
on the subjective foundations of the 
theory of probability; this contained 
an axiom system in which both utility 
and subjective probability appeared. 
He used 0.5 subjective probability as 
a reference point from which to de- 
termine utilities, and then used these 
utilities to determine other sub- 
jective probabilities. Apparently, 
economists did not discover Ramsey's 
essay until after von Neumann and 
Morgenstern’s book aroused interest 
in the subject. The only other formal 
axiom system in which both utility 
and subjective probability play a 
part is one proposed by Savage 
(171), which is concerned with un- 
certainty, rather than risk, and uses 
the concept of subjective probability 
in its theory-of-probability sense. 

The most extensive and important 
experimental work in the whole field 
of decision making under risk and 
uncertainty is now being carried out 
by Coombs and his associates at the 
University of Michigan. Coombs’s 
thinking about utility and subjective 
probability is an outgrowth of his 
thinking about psychological scaling 
in general. (For a discussion of his 
views, see 43, 44, 45, 46, 47.) The 
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essence of his work is the attempt to 
measure both utility and subjective 
probability on an ordered metric 
scale. An ordered metric scale has all 
the properties of an ordinal scale, 
and, in addition, the distances be- 
tween some or all of the stimuli can 
be rank ordered. Coombs has de- 
veloped various experimental pro- 
cedures for obtaining such informa- 
tion about the spacings of stimuli. 
In the most important article on 
utility and subjective probability to 
come out of the Coombs approach, 
Coombs and Beardslee (48) present 
an analysis of gambling decisions in- 
volving three independent variables: 
utility for prize, utility for stake, and 
subjective probability. All three are 
assumed measurable only up to an 
ordered metric, although it is as- 
sumed that the psychological prob- 
ability of losing the stake is one minus 
the psychological probability of 


winning the prize, an assumption that 


limits the permissible underlying 
psychological probability functions 
to shapes like those in Fig. 3. An 
elaborate graphic analysis of the in- 
difference surfaces in this three- 
dimensional space is given, contain- 
ing far too many interesting relation- 
ships to summarize here. An ex- 
periment based on this model was de- 
signed. Coombs is reluctant to use 
sums of money as the valuable ob- 
jects in his experiments because of 
the danger that subjects will respond 
to the numerical value of the amount 
of dollars rather than to the psycho- 
logical value. Therefore he used 
various desirable objects (eg., a 
radio) as stimuli, and measured their 
utility by the techniques he has de- 
veloped to obtain ordered metric 
scales. He used simple numerical 
statements of probability as the 
probability stimuli, and assumed that 
subjective probability was equal to 
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objective probability. The subject 
from whose judgments the ordered 
metric utility measurement was con- 
structed was then presented with 
imaginary bets involving these ob- 
jects and probabilities, and it turned 
out that she almost always chose the 
one with the higher expected utility. 
This experiment is significant only 
as an illustration of the application 
of the method; the conclusion that 
subjects attempt to maximize ex- 
pected utility cannot very comfort- 
ably be generalized to other subjects 
and to real choices without better 
evidence. 

Coombs and Milholland (49) did a 
much more elaborate experiment in 
which they established ordered metric 
scales, both for the utilities of a col- 
lection of objects and for the subjec- 
tive probabilities of a collection of 
statements (e.g., Robin Roberts will 
win 20 games next year). Statements 
and objects were combined into 
“bets,” and the two subjects for 
whom the ordered metric scales had 
been established were asked to make 
judgments about which bet they 
would most, and which they would 
least, prefer from among various 
triads of bets. These judgments were 
examined to discover whether or not 
they demonstrated the existence of 
at least one convex indifference curve 
between utility and subjective prob- 
ability (the requirements for demon- 
strating the convexity of an _ in- 
difference curve by means of ordered 
metric judgments are fairly easy to 
state). A number of cases consistent 
with a convex indifference curve were 
found, but a retest of the ordered 
metric data revealed changes which 
eliminated all of the cases consistent 
with a convex indifference curve for 
one subject, and all but one case for 
the other. It is not possible to make 
a statistical test of whether or not 
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that one case might have come about 
by chance. No evidence was found 
for the existence of concave indiffer- 
ence curves, which are certainly in- 
consistent with the theory of risky 
decisions. This experiment is a fine 
example of the strength and weak- 
ness of the Coombs approach. It 
makes almost no assumptions, takes 
very little for granted, and avoids 
the concept of error of judgment; as 
a result, much of the potential in- 
formation in the data is unused and 
rarely can any strong conclusions be 
drawn. 

A most disturbing possibility is 
raised by experiments by Marks (133) 
and Irwin (94) which suggest that the 
shape of the subjective probability 
function is influenced by the utilities 
involved in the bets. If utilities and 
subjective probabilities are not inde- 
pendent, then there is no hope of pre- 
dicting risky decisions unless their 
law of combination is known, and it 
seems very difficult to design an ex- 
periment to discover that law of com- 
bination. However, the main dif- 
ferences that Marks and Irwin found 
were between probabilities attached 
to desirable and undesirable alterna- 
tives. It is perfectly possible that 
there is one subjective probability 
function for bets with positive ex- 
pected values and a different one for 
bets with negative expected values, 
just as the negative branch of the 
Markowitz utility function is likely 
to be different from the positive 
branch. The results of my probabil- 
ity preference experiments showed 
very great differences between the 
probability preference patterns for 
positive and for negative expected- 
value bets (57), but little difference 
between probability preferences at 
different expected-value levels so 
long as zero expected value was not 
crossed (59). This evidence supports 
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the idea that perhaps only two sub- 
jective probability functions are nec- 
essary. 

Santa Monica Seminar. In the 
summer of 1952 at Santa Monica, 
California, a group of scientists con- 
ferred on problems of decision mak- 
ing. They met in a two-month semi- 
nar sponsored by the University of 
Michigan and the Office of Naval 
Research. The dittoed reports of 
these meetings are a gold mine of 
ideas for the student of this problem. 
Some of the work done at this semi- 
nar is now being prepared for a book 
on Decision Processes edited by R. M. 
Thrall, C. H. Coombs, and R. L. 
Davis, of the University of Michigan. 

Several minor exploratory experi- 
ments were done at this seminar. 
Vail (190) did an experiment in which 
he gave four children the choice of 
which side of various bets they 


wanted to be on. On the assumption 
of linear utilities, he was able to com- 
pute subjective probabilities for these 


children. The same children, how- 
ever, were used as subjects for a 
number of other experiments; so, 
when Vail later tried them out on 
some other bets, he found that they 
consistently chose the bet with the 
highest probability of winning, re- 
gardless of the amounts of money in- 
volved. When 50-50 bets were in- 
volved, one subject consistently chose 
the bet with the lowest expected 
value. No generalizable conclusions 
can be drawn from these experiments. 

Kaplan and Radner (100) tried out 
a questionnaire somewhat like 
Coombs’s method of measuring sub- 
jective probability. Subjects were 
asked to assign numbers to various 
statements. The numbers could be 
anything from 0 to 100 and were to 
represent the likelihood that the 
statement was true. The hypotheses 
to be tested were: (a) for sets of ex- 
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haustive and mutually exclusive 
statements in which the numbers as- 
signed (estimates of degree of belief) 
were nearly equal, the sums of these 
numbers over a set would increase 
with the number of alternatives (be- 
cause low probabilities would be over- 
estimated); (6) for sets with the same 
numbers of alternatives, those with 
one high number assigned would have 
a lower set sum than those with no 
high numbers. The first prediction 
was verified; the second was not. 
Any judgments of this sort are so 
much more likely to be made on the 
basis of number preferences and 
similar variables than on subjective 
probabilities that they offer very 
little hope as a method of measuring 
subjective probabilities. 

Variance preferences. Allais (2, 3, 
4) and Georgescu-Roegen (78) have 
argued that it is not enough to apply 
a transform on objective value and on 
objective probability in order to pre- 
dict risky decisions from expected 
utility (see also 188); it is also neces- 
sary to take into account at least the 
variance, and possibly the higher 
moments, of the utility distribution. 
There are instances in which this 
argument seems convincing. You 
would probably prefer the certainty 
of a million dollars to a 50-50 chance 
of getting either four million or noth- 
ing. I do not think that this prefer- 
ence is due to the fact that the ex- 
pected utility of the 50-50 bet is less 
than the utility of one million dollars 
to you, although this is possible. A 
more likely explanation is simply 
that the variances of the two propo- 
sitions are different. Evidence in 
favor of this is the fact that if you 
knew you would be offered this choice 
20 times in succession, you would 
probably take the 50-50 bet each 
time. Allais (5) has constructed a 
number of more sophisticated exam- 
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ples of this tvpe. However, from a 
simple-minded psychological point of 
view, these examples are irrelevant. 
It is enough if the theory of choice 
can predict choices involving familiar 
amounts of money and_ familiar 
probability differences—choices such 
as those which people are accustomed 
to making. It may be necessary for 
economic theory that the theory of 
choice be universal and exceptionless, 
but experimental psychologists need 
not be so ambitious. This is fortu- 
nate, because the introduction of the 
variance and higher moments of the 
utility distribution makes the prob- 
lem of applying the theory experi- 
mentally seem totally insoluble. It is 
difficult enough to derive reasonable 
methods of measuring utility alone 
from risky choices; when it also be- 
comes necessary to measure subjec- 
tive probability and to take the 
higher moments of the utility dis- 
tribution into account, the problem 
seems hopeless. Allais apparently 
hopes to defeat this problem by using 
psychophysical methods to measure 
utility (and presumably subjective 
probability also). This is essentially 
what Coombs has done, but Coombs 
has recognized that such procedures 
are unlikely to yield satisfactory 
interval scales. The dollar scale of 
the value of money is so thoroughly 
taught to us that it seems almost im- 
possible to devise a psychophysical 
situation in which subjects would 
judge the utility, rather than the dol- 
lar value, of dollars. They might 
judge the utility of other valuable 
objects, but since dollars are the 
usual measure of value, such judg- 
ments would be less useful, and even 
these judgments would be likely to be 
contaminated by the dollar values of 
the objects. I would get more utility 
from a new electric shaver than I 


would from a new washing machine, 
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but because of my knowledge of the 
relative money values of these ob- 
jects, I would certainly choose the 
washing machine if given a choice 
between them. Somewhat similar 
arguments can be applied against 
using psychophysical methods to 
measure subjective probability. <A 
final point is that, since these subjec- 
tive scales are to be used to predict 
choices, it would be best if they could 
be derived from similar choices. 
Other approaches. Shackle (175) 
has proposed a theory of decision 
making under risk and uncertainty. 
This theory is unique in that it does 
not assume any kind of maximizing 
behavior. For every possible out- 
come of a decision made in a risky or 
uncertain situation, Shackle assumes 
that there is a degree of potential 
surprise that this, rather than some 
other, outcome would occur. Every 
outcome-potential surprise pair is 
ranked in accordance with its ability 
to stimulate the mind (stimulation in- 


creases with increasing outcome and 
decreases with increasing potential 


surprise). The highest-ranking posi- 
tive oulcome-potential surprise pair 
and the highest-ranking negative pair 
are found, and these two possibilities 
alone determine what the individual 
will do. Semi-mathematical methods 
are used to predict the outcome of 
consideration of possible lines of ac- 
tion. Although attempts have been 
made to relate it to Wald’s minimax 
principle for statistical decision func- 
tions (see below), the fact remains 
that most critics of the Shackle point 
of view have judged it to be either too 
vague to be useful, or, if specified in 
detail, too conducive to patently ab- 
surd predictions (e.g., 201). 
Shackle’s point of view was de- 
veloped primarily to deal with unique 
choices—choices which can be made 
only once. Allais (3) has similarly 
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criticized conventional utility the- 
ory’s attack on this problem. Since 
the usual frequency theory of prob- 
ability conceives of the probability as 
the limit of the outcomes of a large 
number of similar trials, it is ques- 
tionable that notions which use prob- 
ability in the ordinary sense (like the 
notion of maximizing expected util- 
ity) are applicable to unique choices. 
However, this seems to be an experi- 
mental problem. If notions which use 
ordinary probability are incapable of 
predicting actual unique choices, 
then it will be necessary to seek other 
theoretical tools. But so long as a 
generally acceptable probability can 
be defined (e.g., as in the unique toss 
of a coin), it is not necessary to as- 
sume a priori that theories based on 
conventional probabilities will be in- 
adequate. When no generally ac- 
ceptable probability can be defined, 
then the problem becomes very dif- 
ferent. 

Cartwright and Festinger (38, 41) 
have proposed a theory about the 
time it takes to make decisions which 
is in some ways similar to those dis- 
cussed in this section. The main dif- 
ference is that they add the concept 
of restraining forces, and that they 
conceive of all subjective magnitudes 
as fluctuating randomly around a 
mean value. From this they deduce 
various propositions about decision 
times and the degree of certainty 
which subjects will feel about their 
decisions, and apparently these prop- 
Ositions work out experimentally 
pretty well (38, 39, 61, 62). The 
Lewinian theoretical orientation 
seems to lead to this kind of model; 
Lewin, Dembo, Festinger, and Sears 
(122) present a formally similar 
theory about level of aspiration. Of 
course, the notion of utility is very 
similar to the Lewinian notion of 
valence. 
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Landahl (115) has presented a 
mathematical model for risk-taking 
behavior based on the conceptual 
neurology of the mathematical bio- 
physics school. 

Psychological comments. The area 
of risky decision making is full of 
fascinating experimental problems. 
Of these, the development of a satis- 
factory scale of utility of money and 
of subjective probability must come 
first, since the theory of risky de- 
cision making is based on these no- 
tions. The criterion for satisfactori- 
ness of these scales must be that they 
successfully predict choices other 
than those from which they were de- 
rived. To be really satisfactory, it is 
desirable that they should predict 
choices in a wide variety of differing 
situations. Unlike the subjective 
scales usually found in psychophys- 
ics, it is likely that these scales will 
differ widely from person to person, 
so a new determination of each scale 


must be made for each new subject. 
It can only be hoped that the scales 
do not change in time to any serious 
degree; if they do, then they are 
useless. 

Once scales of utility and subjec- 
tive probability are available, then 


many interesting questions arise. 
What about the addition theorem for 
subjective probabilities? Does gam- 
bling itself have utility, and how 
much? ‘To what extent can these sub- 
jective scales be changed by learning? 
To what degree do people differ, and 
can these differences be correlated 
with environmental, historical, or 
personality differences? Finally, psy- 
chologists might be able to shed light 
on the complex economic problem of 
interacting utilities of different goods. 

The area of risky decision making, 
like the area of the theory of games, 
tends to encourage in those inter- 
ested in it the custom of carrying out 
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small pilot experiments on their sons, 
laboratory assistants, or secretaries. 
Such experiments are too seldom 
adequately controlled, and are al- 
most never used as a basis for larger- 
scale, well-designed experiments. 
Whether an ill-designed and hap- 
hazardly executed little experiment is 
better than no experiment at all is 
questionable. The results of such 
pilot experiments too often are picked 
up and written into the literature 
without adequate warning about the 
conditions under which they were 
performed and the consequent limita- 
tions on the significance of the results. 


THE TRANSITIVITY OF CHOICES 

In the section on riskless choices 
this paper presented a definition of 
economic man. The most important 
part of this definition can be summed 
up by saying that economic man is 
rational. ‘The concept of rationality 
involves two parts: that of a weak 
ordering of preferences, and that of 
choosing so as to maximize some- 
thing. Of these concepts, the one 
which seems most dubious is the one 
of a weakly ordered preference field. 
This is dubious because it implies 
that choices are transitive; that is, if 
A is preferred to B, and B is preferred 
to C, then A is preferred to C. 

‘Two economists have designed ex- 
periments specifically intended to 
test the transitivity of choices. Pap- 
andreou performed an elaborate and 
splendidly controlled experiment 
(145) designed to discover whether or 
not intransitivities occurred in im- 
agined-choice situations. He pre- 
pared triplets of hypothetical bun- 
dles of admissions to plays, athletic 
contests, concerts, etc., and required 
his subjects to choose between pairs 
of bundles. Each bundle consisted of 
a total of four admissions to two 
events, e.g., 3 plays and 1 tennis 
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tournament. In the main experi- 
ment, each bundle is compared with 
two others involving the same kinds 
of events, but in the better designed 
auxiliary experiment, a total of six 
different events are used, so that each 
bundle has no events in common with 
the other two bundles in its triplet. 
Since there are three bundles in each 
triplet, there are three choices be- 
tween pairs for each triplet, and 
these choices may, or may not, be 
transitive. The subjects were per- 
mitted to say that they were indiffer- 
ent between two bundles; conse- 
quently there were 27 possible con- 
figurations of choices, of which only 
13 satisfied the transitivity axiom. 
In the main experiment, 5 per cent 
of the triplets of judgments were 
intransitive; in the auxiliary experi- 
ment, only 4 per cent. Papandreou 
develops a_ stochastic model for 
choices under such conditions; the 
results are certainly consistent with 
the amount of intransitivity per- 
Papandreou 


mitted by his model. 
concludes that at least for his specific 
experimental conditions, transitivity 
does exist. 

May (138), using different kinds of 
stimuli in a less elaborate experiment, 
comes up with results less consistent 


with transitivity. May required a 
classroom group to make pairwise 
choices between three marriage part- 
ners who were identified only by 
saying how intelligent, good looking, 
and rich they were. Judgments of 
indifference were not permitted. The 
results were that 27 per cent of the 
subjects gave intransitive triads of 
choices. May suggests, very plausi- 
bly, that intransitive choices may be 
expected to occur whenever more 
than one dimension exists in the 
stimuli along which subjects may 
order their preferences. However, 
May would probably have gotten 
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fewer intransitivities if he had per- 
mitted the indifference judgment. If 
subjects are really indifferent among 
all three of the elements of a triad of 
objects, but are required to choose 
between them in pairs and do so by 
chance, then they will choose in- 
transitively one-fourth of the time. 
Papandreou’s stochastic model gives 
one theory about what happens 
when preferences diverge just slightly 
from indifference, but presumably a 
more detailed model can be worked 
out. Papandreou’s model permits 
only three states: prefer A to B, 
prefer B to A, and indifferent. It 
ought to be possible to base a model 
for such situations on the cumulative 
normal curve, and thus to permit any 
degree of preference. For every com- 
bination of degrees of preference, 
such a model would predict the fre- 
quency of intransitive choices. 

In the paired comparisons among 
bets (57) described in the section on 
risky choices, quite elaborate in- 
transitivities could and did occur. 
However, it is easy to show that any 
intransitivity involving four or more 
objects in a_ paired comparisons 
judgment situation will necessarily 
produce at least one intransitivity in- 
volving three objects. Consequently, 
the intransitive triplet or circular 
triad is the best unit of analysis for 
intransitivities in these more com- 
plicated judgment situations. | 
counted the frequency of occurrence 
of circular triads and found that they 
regularly occurred about 20 per cent 
of the total number of times they 
could occur. (Of course, no indiffer- 
ence judgments could be permitted.) 
The experiment fulfills May’s cri- 
terion for the occurrence of intransi- 
tivities, since both probability and 
amount of money were present in 
each bet, and subjects could be ex- 
pected to take both into account 
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when making choices. It might be 
supposed that the difference between 
the imaginary choices of the Papan- 
dreou and May experiments and the 
real choices in my experiment would 
lead to differences in the frequency of 
occurrence of intransitivities, but 
there were no substantial differences 
in my experiment between the fre- 
quencies of occurrence in the just- 
imagining sessions and in the real 
gambling sessions, and what differ- 
ences there were, were in the direction 
vf greater transitivity when really 
gambling. These facts should facili- 
tate further experiments on this prob- 
lem. 

In one sense, transitivity can never 
be violated. A minimum of three 
choices is required to demonstrate 
intransitivity. Since these 
will necessarily be made in sequence, 
it can always be argued that the per- 
son may have changed his tastes be- 
tween the first choice and the third. 
However, unless the assumption of 
constancy of tastes over the period of 
experimentation is made, no expert- 
ments on choice can ever be mean- 
ingful, and the whole theory of choice 
becomes empty (see 184 for a similar 
situation). So this quibble can be re- 
jected at once. 

Utility maximization will not work 
except with a transitive preference 
field. Consequently, if the models 
discussed in this paper are to predict 
experimental data, it is necessary 
that intransitivities in these data be 
infrequent enough to be considered 
as errors. However, from a slightly 
different point of view (54) the occur- 
rence or nonoccurrence of transitive 
choice patterns is an experimental 
phenomenon, and presumably a law- 
ful one. May has suggested what 
that law is: Intransitivities occur 
when there are conflicting stimulus 
which to judge. 


choices 


dimensions along 
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This notion could certainly be tested 
and made more specific by appropri- 
ate experiments. 

A final contribution in a related, 
but different, area is Vail’s stochastic 
utility model (191). Vail assumes 
that choices are dependent on utili- 
ties that oscillate in a random man- 
ner around a mean value. From this 
assumption plus a few other reason- 
able ones, he deduces that if the 
over-all preference is 1>2>3, and if 
1 is preferred to 2 more than 2 is 
preferred to 3, then the frequencies of 
occurrence of the six possible transi- 
tive orderings should be ordered as 
follows: 123>132>213>312>231 
>321. This result is certainly easy 
to test experimentally, and sounds 
plausible. 


THEORY OF GAMES AND OF 
DECISION FUNCTIONS® 


THI 


This section will not go into the 
theory of games or into the intimately 
related subject of statistical decision 
functions at all thoroughly. These 
are mathematical subjects of a highly 


® Marschak (134), Hurwicz (92), Neisser 
(143), Stone (181), and Kaysen (107) pub- 
lished reviews of The Theory of Games and 
Economic Behavior which present the funda- 
mental ideas in much simpler language than 
the original source. Marschak works out in 
detail the possible solutions of a complicaced 
three-person bargaining game, and thereby 
illustrates the general natureof asolution. The 
two volumes of Contributions to the Theory of 
Games (112, 113), plus McKinsey’s book on 
the subject (129), provide an excellent bibliog- 
raphy of the mathematical literature. McKin- 
sey's book is an exposition of the fundamental 
concepts, intended as a textbook, which is 
simpler than von Neumann and Morgenstern 
and pursues certain topics further. Wald's 
book (198) is, of course, the classical work on 
statistical decision functions. Bross’s book 
(35) presents the fundamental ideas about 
statistical decision functions more simply, and 
with a somewhat different emphasis. Girshick 
and Blackwell's book (79) is expected to be a 
very useful presentation of the field. 
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technical sort, with few statements 
which lend themselves to experimen- 
tal test. Rather, the purpose of this 
section is to show how these subjects 
relate to what has gone before, to give 
a brief summary of the contents of 
Theory of Games and Economic Be- 
havior by von Neumann and Morgen- 
stern (197), and to describe a few ex- 
periments in the area of game playing 

-experiments which are stimulated 
by the theory of games although not 
directly relevant to it. 

The theory of games. The theory of 
games probably originated in the 
work of Borel (31, 32, 33, 34; see also 
71, 72) in the 1920's. In 1928, von 
Neumann (195), working independ- 
ently of Borel, published the first 
proof of the fundamental theorem in 
the theory, a theorem that Borel had 
not believed to be generally true. 
However, the subject did not become 
important until 1944, when von 
Neumann and Morgenstern pub- 
lished their epoch-making book (196). 
(A second edition, with an appendix 
on cardinal utility measurement, 
came out in 1947 [197].) Their pur- 
pose was to analyze mathematically a 
very general class of problems, which 
might be called problems of strategy. 
Consider a game of tic-tac-toe. You 
know at any moment in the game 
what the moves available to your op- 
ponent are, but you do not know 
which one he will choose. The only 
information you have is that his 
choice will not, in general, be com- 
pletely random; he will make a move 
which is designed in some way to in- 
crease his chance of winning and di- 
minish yours. Thus the situation is 
one of uncertainty rather than risk. 
Your goals are similar to your op- 
ponent’s. Your problem is: what 
strategy should you adopt? The 
theory of games offers no practical 
help in developing strategies, but it 
does offer rules about how to choose 
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among them. In the case of tic-tac- 
toe, these rules are trivial, since 
either player can force a draw. But 
in more complicated games of strat- 
egy, these rules may be useful. In 
particular, the theory of games may 
be helpful in analyzing proper strat- 
egy in games having random ele- 
ments, like the shuffling of cards, or 
the throwing of dice. It should be 
noted that the concept of a game is an 
exceedingly general concept. A scien- 
tist in his laboratory may be con- 
sidered to be playing a game against 
Nature. (Note, however, that we 
cannot expect Nature to try to defeat 
the scientist.) Negotiators in a labor 
dispute are playing a game against 
one another. Any situation in which 
money (or some valuable equivalent) 
may be gained as the result of a 
proper choice of strategy can be con- 
sidered as a game. 

To talk about game theory, a few 
technical terms are necessary. A 


strategy is a set of personal rules for 


playing the game. For each possible 
first move on your part, your op- 
ponent will have a possible set of re- 
sponses. For each possible response 
by your opponent, you will have a set 
of responses, and so on through the 
game. A strategy is a list which speci- 
fies what your move will be for every 
conceivable previous set of moves of 
the particular game you are playing. 
Needless to say, only for the simplest 
games (e.g., matching pennies) does 
this concept of strategy have any 
empirical meaning. 

Associated with strategies are im- 
putations. An imputation is a set of 
payments made as a result of a game, 
one to each player. In general, differ- 
ent imputations will be associated 
with different sets of strategies, but 
for any given set of strategies there 
may be more than one imputation 
(in games involving coalitions). 

Imputation X is said to dominate 
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imputation Y if one or more of the 
players has separately greater gains 
(or smaller losses) in X than in Y and 
can, by acting together (in the case of 
more than one player), enforce the 
occurrence of Y, or of some other im- 
putation at least as good. The rela- 
tionship of domination is not transi- 
tive. 

A solution is a set of imputations, 
none of which dominates another, 
such that every imputation outside 
the solution is dominated by at least 
one imputation within the solution. 
Von Neumann and Morgenstern as- 
sert that the task of the theory of 
games is to find solutions. For any 
game, there may be one or more than 
one. One bad feature of the theory of 
games is that it frequently gives a 
large, or even infinite, number of solu- 
tions for a game. 

The above definitions make clear 
that the only determiner of behavior 
in games, according to this theory, is 
the amounts of money which may be 
won or lost, or the expected amounts 
in games with random elements. ‘The 
fun of playing, if any, is irrelevant. 

The minimax loss principle. The 
notions of domination and of solution 
imply a new fundamental rule for 
decision making—a rule sharply dif- 
ferent from the rule of maximizing 
utility or expected utility with which 
this paper has been concerned up to 
this section. This rule is the rule of 
minimizing the maximum loss, or, 
more brictly, minimax loss. In other 
words, the rule is to consider, for each 
possible strategy that you could 
adopt, what the worst possible out- 
come is, and then to select that strat- 
egy which would have the least ill- 
effects if the worst possible outcome 
happened. Another way of putting 
the same idea is to call it the principle 
of maximizing the minimum gain, or 
maximin gain. This rule makes con- 
siderable sense in two-person games 
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when you consider that the other 
player is out to get you, and so will 
do his best to make the worst possible 
outcome for you occur. If this rule is 
geometrically, it asserts 
that the point you should seek is a 
saddle-point, like the highest point in 
a mountain pass (the best rule for 
crossing mountains is to minimize the 
maximum height, so explorers seek 
out such saddle-points). 

Before we go any further, we need 
a few more definitions. Games may 
be among any number of players, but 
the simplest game is a two-person 
game, and it is this kind of game 
which has been most extensively and 
most successfully analyzed. Funda- 
mentally, two kinds of payoff ar- 
rangements are possible. The sim- 
plest and most common is the one in 
which one player wins what the other 
player loses, or, more generally, the 
one for which the sum of all the pay- 
ments made as a result of the game is 
zero. ‘This is called a zero-sum game. 
In monzero-sum games, analytical 
complexities arise. These can be di- 
minished by assuming the existence 
of a fictitious extra player, who wins 
or loses enough to bring the sum of 
payments back to zero. Such a ficti- 
tious player cannot be assumed to 
have a strategy and cannot, of course, 
interact with any of the other players. 

In zero-sum two-person games, 
what will happen? Each player, ac- 
cording to the theory, should pick his 
minimax strategy. But will this re- 
sult ina stable solution? Not always. 


expressed 


Sometimes the surface representing 


the possible outcomes of the game 
does not have a saddle-point. In this 
case, if player A chooses his minimax 
strategy, then player B will have an 
incentive not to use his own minimax 
strategy, because having found out 
his opponent's strategy, he can gain 
more by some other strategy. Thus 
the game has no solution. 
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Various resolutions of this problem 
are possible. Von Neumann and 
Morgenstern chose to introduce the 
notion of a mixed strategy, which is a 
probability distribution of two or 
more pure strategies. The fundamen- 
tal theorem of the theory of games is 
that if both players in a zero-sum 
two-person game adopt mixed strat- 
egies which minimize the maximum 
expected loss, then the game will al- 
ways have a saddle-point. Thus each 
person will get, in the long run, his 
expected loss, and will have no in- 
centive to change his behavior even 
if he should discover what his op- 
ponent’s mixed strategy is. Since A is 
already getting the minimum possible 
under the strategy he chose, any 
change in strategy by B will only in- 
crease A's payoff, and therefore cause 
B to gain less or lose more than he 
would by his own minimax strategy. 
The same is true of B. 

(James involving more than two 
people introduce a new element—the 
possibility that two or more players 
will cooperate to beat the rest. Such 
a cooperative agreement is called a 
coalition, and it frequently involves 
side-payments among members of the 
coalition. The method of analysis for 
three-or-more-person games is to con- 
sider all possible coalitions and to 
solve the game for each coalition on 
the principles of a two-person game. 
This works fairly well for three-per- 


son games, but gets more complicated 
and less satisfactory for still more 
people. 

This is the end of this exposition of 


the content of von Neumann and 
Morgenstern’s book. It is of course 
impossible to condense a tremendous 
and difficult book into one page. The 
major points to be emphasized are 
these: the theory of games is not a 
model of how people actually play 
games (some game theorists will dis- 
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agree with this), nor is it likely to be 
of any practical use in telling you 
how to play a complicated game; the 
crux of the theory of games is the 
principle of choosing the strategy 
which minimizes the maximum ex- 
pected financial loss; and the theory 
defines a solution of a game as a set 
of imputations which satisfies this 
principle for all players. 

Assumptions. In their book von 
Neumann and Morgenstern say ‘We 
have ...assumed that [utility] is 
numerical . . . substitutable and un- 
restrictedly transferable between the 
various players.”’ (197, p. 604.) Game 
theorists disagree about what this 
and other similar sentences mean. 
One likely interpretation is that they 
assume utility to be linear with the 
physical value of money involved in 
a game and to be interpersonally 
comparable. The linear utility curves 
seem to be necessary for solving two- 
person games; the interpersonal com- 
parability is used for the extension to 
n persons. Attempts are being made 
to develop solutions free of these as- 
sumptions (176). 

Statistical decision functions. Von 
Neumann (195) first used the mini- 
max principle in his first publication 
on game theory in 1928. Neyman 
and Pearson mentioned its applic- 
ability to statistical decision prob- 
lems in 1933 (144). Wald (198), who 
prior to his recent death was the 
central figure in the statistical deci- 
sion-function literature, first seriously 
applied the minimax principle to sta- 
tistical problems in 1939. Appar- 
ently, all these uses of the principle 
were completely independent of one 
another. 

After Theory of Games and Eco- 
nomic Behavior appeared in 1944, 
Wald (198) reformulated the problem 
of statistical decision making as one 
of playing a game against Nature. 





THEORY OF DECISION MAKING 


The statistician must decide, on the 
observations which cost 
something to make, between policies, 
each of which has a possible gain or 
loss. In some cases, all of these gains 
and losses and the cost of observing 
can be exactly calculated, as in in- 
dustrial quality control. In other 
cases, as in theoretical research, it is 
necessary to make some assumption 
about the cost of being wrong and the 
gain of being right. At any rate, when 
they are put in this form, it is obvious 
that the ingredients of the problem of 
statistical decision making have a 
gamelike sound. Wald applied the 
minimax principle to them in a way 
essentially identical with game the- 
ory. 

A very frequent criticism of the 
minimax approach to games against 
Nature is that Nature is not hostile, 
as is the opponent in a two-person 
Nature will not, in general, 
For this 


basis of 


game. 
use a minimax strategy. 


reason, other principles of decision 


making have been suggested. The 
simple principle of maximizing ex- 
pected utility (which is the essence of 
the Bayes’s theorem [{15, 198] solution 
of the problem) is not always applica- 
ble because, even though Nature is 
not hostile, she does not offer any 
way of assigning a probability to each 
possible outcome. In other words, 
statistical decision making is a prob- 
lem of uncertainty, rather than of 
risk. Savage has suggested the prin- 
ciple of minimaxing regret, where re- 
gret is defined as the difference be- 
tween the maximum which can be 
gained under any strategy given a 
certain state of the world and the 
amount gained under the strategy 
adopted. Savage believes (170, also 
personal communication) that neither 
von Neumann and Morgenstern nor 
Wald actually intended to propose 
the principle of minimaxing loss; they 
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confined their discussions to cases in 
which the concepts of minimax loss 
and minimax regret amount to the 
same thing. Other suggested princi- 
ples are: maximizing the maximum 
expected gain, and maximizing some 
weighted average of the maximum 
and minimum expected gains (93). 
None of these principles commands 
general acceptance; each can be 
made to show peculiar consequences 
under some conditions (see 170). 

Experimental games. The concepts 
of the theory of games suggest a new 
field of experimentation: How do 
people behave in game situations? 
Such experimentation would center 
on the development of strategies, par- 
ticularly mixed strategies, and, in 
three-or-more-person games, on the 
development of coalitions and on the 
bargaining process. You should re- 
member that the theory of games 
does not offer a mathematical model 
predicting the outcomes of such 
games (except in a few special cases) ; 
all it does is offer useful concepts and 
language for talking about them, and 
predict that certain outcomes will 
not occur. 

A few minor experiments of this 
kind have been conducted by Flood, 
a mathematician, while he was at 
Rand Corporation. He usually used 
colleagues, many of whom were ex- 
perts in game theory, and secretaries 
as subjects. The general design of 
his experiments was that a group of 
subjects were shown a group of de- 
sirable objects on a table, and told 
that they, as a group, could have the 
first object they removed from the 
table, and that they should decide 
among themselves which object to 
choose and how to allocate it. In the 
first experiment (64) the allocation 
problem did not arise because enough 
duplicate objects were provided so 
that each subject could have one of 
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the kind of object the group selected. 
The subjects were Harvard under- 
graduates, and the final selection was 
made by negotiation and voting. In 
the second experiment (65), in which 
the subjects were colleagues and sec- 
retaries, a long negotiation process 
eliminated some of the objects, but a 
time limit forced a selection by lot 
from among the rest. Further negoti- 
ations to solve the allocation problem 
were terminated by a secretary, who 
snatched the object, announced that 
it was hers, and then tried to sell it. 
No one was willing to buy, so the ex- 
periment terminated. Other experi- 
ments (66, 67) showed that coalitions 
sometimes form, that a sophisticated 
subject could blackmail the group for 
an extra side-payment by threatening 
to change his vote, and that the 


larcenous secretary, having succeeded 
once, had to be physically restrained 
in subsequent sessions to prevent 
more larceny. The general conclusion 
suggested by all these experiments is 


that even experts on game theory are 
less rational and more conventional 
than game theory might lead experi- 
menters to expect. 

Psychological comments. The most 
nutritive research problems in this 
area seem to be the social problems of 
how bargaining takes place. Flood’s 
experiments left bargainers free and 
used physical objects, whose utilities 
probably vary widely from subject to 
subject, as stimuli to bargain over. 
This is naturalistic, but produces 
data too complex and too nonnumeri- 
cal for easy analysis. A simpler situa- 
tion in which the possible communi- 
cations from one bargainer to an- 
other are limited (perhaps by means 
of an artificial vocabulary), in which 
the subjects do not see one another, 
and in which the object bargained 
over is simple, preferably being 
merely a sum of money, would be 
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better. Physical isolation of one sub- 
ject from another would make it pos- 
sible to match each subject against a 
standard bargainer, the experimenter 
or a stooge, who bargains by a fixed 
set of rules that are unknown to the 
subject. Flood (personal communica- 
tion) is conducting experiments of 
this sort. For three-or-more-person 
games, Asch’'s (16) technique of using 
a group consisting of only one real 
subject and all the rest stooges might 
well be used. It would be interesting, 
for instance, to see how the probabil- 
ity of a coalition between two players 
changes as the number and power of 
players united against them increase. 

The theory of games is the area 
among those described in this paper 
in which the uncontrolled and casu- 
ally planned ‘“‘pilot experiment’’ is 
most likely to occur. Such experi- 
ments are at least as dangerous here 
as they are in the area of risky de- 
cision making. Flood’s results sug- 
gest that it is especially important to 
use naive subjects and to use them 
only once, unless the effects of expert- 
ness and experience are the major 
concern of the experiment. 


SUMMARY 


For a long time, economists and 
others have been developing mathe- 
matical theories about how people 
make choices among desirable alter- 
natives. These theories center on the 
notion of the subjective value, or 
utility, of the alternatives among 
which the decider must choose. They 
assume that people behave rationally, 
that is, that they have transitive 
preferences and that they choose in 
such a way as to maximize utility or 
expected utility. 

The traditional theory of riskless 
choices, a straightforward theory of 
utility maximization, was challenged 
by the demonstration that the mathe- 
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matical tool of indifference curves 
made it possible to account for risk- 
less choices without assuming that 
utility could be measured on an in- 
terval scale. The theory of riskless 
choices predicted from indifference 
curves has been worked out in detail. 
[-xperimental determination of indif- 
ference curves is possible, and has 
been attempted. But utility meas- 
ured on an interval scale is necessary 
(though not sufficient) for welfare 
economics. 

Attention was turned to risky 
choices by von Neumann and Mor- 
genstern’s demonstration that com- 
plete weak ordering of risky choices 
implies the existence of utility meas- 
urable on an interval scale. Mosteller 
and Nogee experimentally deter- 
mined utility curves for money from 
gambling decisions, and used them to 
predict other gambling decisions. 
Edwards demonstrated the existence 
of preferences among probabilities in 


gambling situations, which compli- 
cates the experimental measurement 
of utility. Coombs developed a model 
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for utility and subjective probability 
measured on an ordered metric scale, 
and did some experiments to test im- 
plications of the model. 

Economists have become worried 
about the assumption that choices 
are transitive. Experiments have 
shown that intransitive patterns of 
choice do occur, and so stochastic 
models have been developed which 
permit occasional intransitivities. 

Che theory of games presents an 
elaborate mathematical analysis of 
the problem of choosing from among 
alternative strategies in games of 
strategy. This paper summarizes the 
main concepts of this analysis. The 
theory of games has stimulated in- 
terest in experiméntal games, and a 
few bargaining experiments which 
can be thought of in game-theoretical 
terms have been performed. 

All these topics represent a new 
and rich field for psychologists, in 
which a theoretical structure has al- 
ready been elaborately worked out 
and in which many experiments need 
to be performed. 
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Part I 


SoctaL PSYCHOLOGICAI 
DATA 


Five years after the report on men, 
the publication of Sexual Behavior 
in the Human Female by Kinsey and 
his collaborators provides the com- 
panion results on women. It provides 
the greatest wealth of data ever ac- 
cumulated on the sexual behavior of 
a particular population, American 
women, and therefore is a factual con- 
tribution of great interest to psychol- 
ogists. To social psychologists and 
those interested in the relations of 
man to society, the new volume un- 
derscores and rounds out a basic find- 
the first volume—namely, 


ing of 
the widespread occurrence of a vari- 
etv of unsanctioned forms of sexual 
behavior both among males and fe- 


males, and the disparity between 
societal norms and such behavior. 
While the novelty of this finding 
may have worn off slightly, the new 
volume has other compensating fea- 


' Kinsey, A. C., Pomeroy, W. B., Mar- 
rin, C. E., & Gepnarp, P. H. Sexual be- 
havior in the human female. Philadelphia 
Saunders, 1953. Pp. xxx +842. $8.00. 

? This multiple review was divided so that 
H. Hyman was concerned with Part I, the 
social psychological data, and J. E. Barmack 
confined himself to Part II, the biological 
material. The reviewers worked independ- 
ently of each other. 
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indebtedness to Paul B. Sheatsley, with whom 
he collaborated on a longer essay on the 
Kinsey reports (cf. Geppes, D. P. [Fd.] 
An analysis of the Kinsey reports on sexual be- 
havior in the human male and female. New 
York: New American Library of World Lit- 
erature, and E.. P. Dutton, 1954). 
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tures. The simple fact that it is the 
second report suggests that the re- 
searchers may well have benefited 
from their past experience, with 
consequent improvements in the 
quality of their findings. This not 
only provides opportunity for im- 
provement of methods, but it also 
permits formerly neglected areas of 
inquiry to be covered. Thus, in the 
earlier volume the emotional and atti- 
tudinal accompaniments of various 
patterns of behavior were hardly 
touched. We shall note below that 
considerable information of this type 
has been added, making the volume 
more relevant to psychologists. 

In addition, the new volume is not 
merely a descriptive study like the 
first; Kinsey now can compare the 
behavior of males and females. The 
comparative profiles of the sexual 
behavior of men and women provide 
evidence on the degree of ‘‘comple- 
mentarity’ of partners. It is such 
data that are relevant to profound 
questions of conflict in marital and 
interpersonal relations, rather than 
separate descriptive data of the be- 
havior of one group or the other. And 
much of the new volume exploits these 
materials; each chapter ends with 
a comparison of men and women, and 
a special chapter is devoted to general 
psychological differences between the 
Further, by parallel analyses 
of the determinants of sexual patterns 
in men and women, there emerge 
data on the differential and lesser im- 
pact of social factors on the sexual 
development of women—a_ particu- 
larly relevant body of materials for 


Sexes. 





SPECIAL 


studies of socialization. Parallel to 
these findings are other new materials 
relevant to general problems of learn- 
ing theory and psychological de- 
velopment. In both volumes, sexual 
behavior was examined in relation to 
various group memberships. But now, 
supplementary anaiyses of a more dy- 
namic sort are introduced. Kinsey 
studies the relation of the individual's 
early sexual patterns to his subse- 
quent sexual behavior and adjust- 
ment. Obviously, such information 
valuable both to clinicians and 
theorists. 

The substantive contributions out- 
lined above can make the new vol- 
ume of great importance, provided 
the methods employed were such as 
to yield sound findings. It is to the 
question of methods therefore that 
this review will be directed. 

This question cannot be answered 
briefly or in any simple way. The 


iS 


official committee appointed by the 


American Statistical Association to 
review the statistical methods em- 
ployed in the first Kinsey report 
did not issue its report until three 
years after it had been established as 
a committee. With such distin- 
guished members as W. G. Cochran, 
Frederick Mosteller, and John W. 
Tukey, it not only examined the 
published work, but it also spent a 
period of time at the Institute at 
Indiana University and engaged in 
certain additional researches and 
examination of other literature. The 
detailed evaluation will require mono- 
graphic publication, and the sum- 
mary findings will require some 43 
pages of journal treatment. It is of 
note that this expert, judicial, and 
reasonable body remarks: 


It would have been possible to write two 
factually correct reports, one of which would 
leave the impression with the reader that 
KPM's work was of the highest quality, the 
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other that the work was of poor quality and 
that the major issues were evaded. We have 
not written either of these extreme reports 
(1, p. 674). 


Evaluation of the methods will 
require a brief discussion of each 
stage in the new inquiry: the nature 
of the sample of 5,940 white Ameri- 
can females who provide the major 
data; the quality of the interviewing; 
the adequacy of the coverage of the 
interview schedule and its construc- 
tion; and analysis of the data. We 
must also allude to the methods in 
the earlier volume. Since much of the 
significance of the new work derives 
from the comparative data on male 
and female, one must examine these 
data in an attempt to discover 
whether or not the differences demon- 
strated are an artifact of some change 
in method. 

One reasonable standard to apply 
to this volume is the level of improve- 
ment in methods, as compared with 
the earlier report. The novelty of 
the earlier investigation was bound 
to create error. But the many criti- 
cisms of the first volume, and the 
additional experience, should have 
led to considerable improvement. 
That this is the case seems beyond 
question. Certain major defects have 
been remedied. Thus, there is much 
greater attention to error and more 
explicit reference to limitations. 
However, it should be noted that 
there are limits to the amount and 
kinds of improvement that could be 
achieved in the second volume. While 
the publication date may suggest 
that the new work had the benefit 
of some years of additional experi- 
ence, this is clearly not the case. 
Both inquiries started in 1938 and 
“throughout the years, female his- 
tories have been added at approxi- 
mately the same rate as the histories 
of males.’” Thus, Kinsey's sampling 





420 


and interviewing procedures cannot 
have been altered appreciably. What 
is subject to change are the analytic 
procedures since, in the processing 
and reporting of female data, Kinsey 
had the benefit of additional time. 
However, even here changes cannot 
be made lightly. For the compara- 
tive findings on males and females to 
be valid, the methods must be com- 
parable. Bearing in mind such limi- 
tations, we can clearly see that Kin- 
sey has made considerable improve- 
ment. 

The sample. Critics of the first 
volume questioned Kinsey's general- 
izations to the whole American male 
population from his sample data. 
While they varied with respect to the 
inadmissibility of certain procedures, 
there was much agreement that the 
generalizations departed  consider- 
ably from proper caution. In the 
present volume, Kinsey is cautious 
about generalizations, and he makes 
no attempt to extend his findings to 
all American women. A telling criti- 
cism of the earlier report was that 
no reader could determine just how 
representative of any particular uni- 
verse the sample was, because no- 
where was there any systematic ac- 
count of the distribution of the 5,300 
males in terms of standard 
acteristics. This defect has now been 
remedied by means of two elaborate 
tables which present detailed infor- 
mation on the composition of the 
5,940 white females. 

While Kinsey has this time pro- 
vided us with these two tables and 
has carefully refrained from making 
unsupported generalizations, it can 
be demonstrated that the sample does 
not accurately represent the white 
female population of the nation. 
Some critics may condemn the whole 
investigation, arguing that a better 
method could have been employed— 


char- 


HERBERT HYMAN AND JOSEPH E. BARMACK 


one which would have permitted un- 
qualified statements about the popu- 
lation as a whole. The question is de- 
batable. Kinsey argues that any 
scientific sampling of predesignated 
respondents would have been impos- 
sible because a high proportion of such 
selected individuals could not be 
counted upon to consent to the inter- 
view and to answer the questions 
truthfully. In the present volume, 
all respondents were ‘“volunteers”’ 
in the sense that they allowed them- 


selves to be interviewed. However, 


histories were not accepted simply 
because a subject expressed the desire 


to be interviewed; all the respondents 
were selected because of their im- 
portance to the over-all sampling 
plan. Kinsey works primarily 
through formal or informal groups 
and makes ingenious and skilful use 
of social pressures. He describes some 
group contacts which required a year 
or two of cultivation. Indeed, the im- 
mense diversity of groups he has 
worked with is a tribute to his per- 
sistence and skill. 

Nevertheless, one must still raise 
grave questions concerning the sam- 
ple. For one thing, a substantial por- 
tion of the U. S. population claims 
no organizational membership what- 
ever. Such individuals would likely 
be excluded from the samples, and 
there is evidence that they differ in 
many respects, perhaps including sex- 
ual behavior, from those who do belong 
to groups and associations. In addi- 
tion, Kinsey could seldom interview 
all or even nearly all of the members 
of any particular group. No figures 
are given on the proportion of ‘‘re- 
fusals’’ to be interviewed, though 
Kinsey does state that 15% of the 
female sample is composed of 100% 
groups’’—that is, which 
member interviewed 
and “‘we may report that a consider- 


groups in 


every was 
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able proportion of the rest of the sam- 
ple has been drawn from groups in 
which something between 50 and 
90 per cent of all the members had 
contributed histories’ (p. 30). Kin- 
sey further states: “Such coverage 
should provide a good sample of 
those particular groups’ (p. 30), 
but this is a dubious assumption. 
There is no reason to believe that the 
histories of the 10-50°% who do not 
contribute would necessarily agree 
with those who do. A _ principal 
finding of the Kinsey reports (to 
many, their most controversial and 
significant finding) has been the high 
incidences and frequencies of both 
sanctioned and unsanctioned forms 
of sexual behavior reported by both 
men and women. A _ speculation 
which cannot be answered concerns 
the possibility that those women who 
agree to an interview might be less 
inhibited and consequently may en- 
gage in more different tvpes of sexual 
activity, and may do so more fre- 
quently. 

Aside from this consideration, it is 
puzzling that Kinsey’s sample is not 
more nearly representative of the 
population in respect to a number of 
gross characteristics. Perhaps the 
most striking deficiency is the failure 
to interview enough females whose 
education did not extend beyond 
grammar school. Some of the most 
interesting differences in the sexual 
behavior of the males studied were 
those attributed to education, and 


one would have expected compara- 


tive data in the present study. In- 
stead, only 3% of the women inter- 
viewed had not attended high school, 
and Kinsey is restrained from making 
any substantial statements about 
this important stratum of the popu- 
lation. On the other hand, 75% of 
the total female sample had attended 
college, and 19% had gone on to 
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postgraduate work. The significance 
of Kinsey's findings must be con- 
siderably diluted when we consider 
that three-fourths of his sample 
came from the 13°, of American wom- 
en who have gone to college, and 
that the 40° of American women 
who never went beyond the eighth 
grade comprised only 3% of those he 
studied. Similarly, the sample is 
deficient in the older age groups and 
is heavily weighted with women in 
their 30's. Some of these sampling 
biases have been deliberate in order 
to build up sufficient cases among 
particular subgroups for special 
analyses. Thus, although Jews repre- 
sent only about 4% of the U.S. popu- 
lation, they account for more than 
one-quarter of Kinsey’s sample. In 
order to make statements about re- 
ligion and sexual behavior, Kinsey 
did require sufficient Jews to allow 
thorough analysis of this religious 
group. In the case of the age and 
education biases, however, this rea- 
son does not seem relevant, and one 
can only wonder at the failure to 
achieve more adequate representation 
in terms of these two important fac- 
tors. 

Obviously one should not dismiss 
Kinsey's study simply because he 
cannot speak authoritatively about 
every group in the population. How- 
ever, one feature of the present vol- 
ume does raise a new question about 
the sampling method; that is the 
comparative analysis of male and 
female behavior. Many statistical 
data demonstrating sex differences 
are presented, and their impact is 
powerful. But a problem that con- 
tinually arises in the interpretation 
of these findings, and one that is 
not stressed by Kinsey, is whether 
the differences represent true sex 
differences or merely differential sam- 
pling biases in the two sets of data. 
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If the biases were equivalent for 
both men and women, the stated 
differences are nonetheless true, 
though limited in generality. But if 
the male sample is biased in one re- 
spect (e.g., too many uneducated) 
and the females in an opposite re- 
spect (too many educated), then the 
interpretation of any differences 
found in the data becomes very difh- 
cult. Since the over-all composition 
of the male sample was never re- 
ported in the first volume, the facts 
cannot be readily established. At- 
tempts to infer this composition from 
scattered tables suggest that some of 
the sex differences may be artifacts of 
differential sampling biases. Thus, 
almost 30% of the women are Jewish, 
while the best guess for the men 
would place only about 15% in that 
class. Only 3% of the women are in 
the grammar school group, but in 
the male sample this proportion is 
much larger. If religious affiliation 
and education are significantly cor- 
related with both male and female 
sexual behavior, the differential 
biases in the two samples could well 
explain some of the aggregate sex 
differences reported. 

However, the criticism is not as 
serious as it might seem. Kinsey pre- 
sents much evidence that female 
sexual behavior is less affected by 
many of the social characteristics 
studied than is male behavior. Thus, 
biases with respect to these factors 
could not account for very much of 
the difference reported. Further- 


more, Kinsey presents in summary 
form in the female volume, and in 
detail in the two separate volumes, 
the male and female findings for vari- 
ous subgroups, so that the careful 
reader can estimate the degree to 
which such biases may affect the over- 


all differences cited. Nevertheless, 
any differential biases with respect to 
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characteristics other than those tabu- 
lated might still remain as an un- 
known source of error in the com- 
parisons. Thus, Kinsey states that 
female sexual behavior is subject 
to much greater variability than the 
male. Since some other unknown 
characteristics must determine the 
variation among females, any such 
differential sampling might 
well contribute to a spurious finding. 
Kinsey does present some empirical 
evidence bearing on the general 
problem when, for example, he shows 
a close agreement between the male 
and female findings with respect to 
marital coitus. The ‘requencies of 
marital coitus would, by definition, 
have to be identical in the aggregate 
for the two samples if the samples 
were equivalent. In many instances, 
a very high degree of agreement is 
reported, but for some few aspects 
of behavior the reports of the two 
samples do not closely conform. The 
speculation is not generally war- 
ranted, but some limited number of 
the reported sex differences may well 
reflect the net operation of all the 
differential biases in the sampling of 
the two groups. 

The interview schedule and the inter- 
viewers. As in the case of the males, 
approximately two hours were spent 
interviewing each subject and cover- 
ing between 300 and 500 items of 
information. No questionnaire was 
presented in either volume. This 
effectively prevents one from ap- 
praising the types of questions that 
Kinsey asked and the manner in 
which they were presented. Kinsey 
argues persuasively against the be- 
lief that “standard questions fed 
through diverse human machines can 
bring standard answers,’’ and he used 
no questionnaire at all in the usual 
sense of the word. The items to be 
covered and the definition of each 


biases 
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item were standard, but the werding 
and order of the particular questions 
were varied in the most meaningful 
manner for each respondent. Since 
the objective conditions of the inter- 
view were not uniform, one can never 
be certain that the differences found 
between individuals and groups are 
not due, at least in part, to differences 
in the wording and order of the ques- 
tions. This consideration takes on 
added weight in connection with the 
many comparisons between males 
and females. Here, the question of 
consistency of interviewing procedure 
might be raised, but unfortunately 
can never be answered. 

Offsetting this possible danger was 
the fact that all of the interviewing 
was done by a very small staff of 
highly trained interviewers. Eighty- 
five per cent of the male interviews, 
and approximately 80% of the fe- 
male, were collected by only two 
interviewers. Only six interviewers 


were used for the male volume, and 
only four for the female study. Each 
of these interviewers was thoroughly 


trained and in continuing contact 
with the others. When their years of 
experience in handling interviews 
with every sort of individual are 
considered, one can assume that 
procedures were more uniform than 
might appear, in spite of the large 
measure of freedom to exercise judg- 
ment from case to case. 

Kinsey presents a variety of evi- 
dence bearing on the quality of the 
interview data. One hundred and 
twenty-four females were reinter- 
viewed after a minimum time lapse 
of 18 months. Compared with other 
studies of the reliability of interview 
reports over short spans of time and 
on issues much less emotional than 
sexual behavior, the extent of agree- 
ment between the first and second 
interviews is remarkably high (for a 
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summary of such studies, see 2). 
Validity of response is examined by 
comparing the results obtained from 
separate interviews with 706 pairs of 
spouses; here, by definition, any dis- 
agreement in reports on marital his- 
tory and behavior would constitute 
invalidity. Again, the general find- 
ings are good and, compared with 
validity studies in the past interview 
literature, they are often amazingly 
good, (see 3 for a general summary). 
It should be noted, however, that 
such validity checks have been con- 
fined to spouses whose joint sexual 
behavior is more or less “sanctioned.” 
Whether or not the same degree of 
validity holds for less sanctioned 
areas of behavior, such as the homo- 
sexual or extramarital, is not demon- 
strated. 

One consideration in evaluating the 
findings was the fact that all of the 
interviewers were males. There is 
considerable evidence for many types 
of studies that the group member- 
ship (e.g., color, class) of the inter- 
viewer may seriously affect the re- 
sponses. And men and women have 
been observed to give different re- 
plies, on many questions, depending 
upon the sex of the interviewer (cf. 
2). True, such studies have also 
shown that the more experienced 
and capable interviewers have been 
able to some extent to overcome the 
effects, and we would expect such 
effects to be at a minimum with inter- 
viewers of the caliber of Kinsey and 
his associates. But it has been estab- 
lished that statistical results can vary 
by as much as 10 percentage points, 
depending upon the sex of the inter- 
viewer, and one has no way of meas- 
uring the possible effects in this 
study. 

The quality of the reports. A variety 
of criticisms of Kinsey’s research have 
been advanced, and will continue to 
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be advanced, on sheer 


grounds. 


axiomatic 
For example, the claim is 
made that data concerning the most 
intimate of human experiences, a 
portion of which may be socially 
disapproved or remote in time, can- 
not, when collected by straightfor- 
ward question-and-answer _ proce- 
dures in a relatively short interview, 
possibly be accurate. 

With respect to critic isms based on 
axiomatic grounds, however, it is 
proper that the burden of proof rest 
on the critics. Kinsey himself cites 
considerable empirical support for 
the quality of his material. Relia- 
bility is high, as measured by re- 
interviews after a considerable lapse 
of time. Agreement in the independ- 
ent reports of spouses concerning 
their joint behavior is high. There is 
some evidence that even events 
which occurred many years ago and 
are remote in the respondent's mem- 
ory are nevertheless enumerated with 
considerable accuracy. Thus, Kin- 
sey’s findings on age of menarche, 
and on other indices of physical de- 
velopment, as _ reported retrospec- 
tively by his respondents, agree 
closely with independent studies 
based on direct observation of these 
matters. High agreement is demon- 
strated between many of his findings 
and the findings of earlier independ- 
ent investigators who relied on differ- 
ent techniques. The material from 
the reinterviews and from the inter- 


views with spouses concerning mat- 
ters remote in time is somewhat less 
reliable than their reports of more 
recent events, but is still surprisingly 
good, 


Terman, in his distinguished re- 
view of the earlier volume, noted 
that Kinsey had not taken proper 
account of possible memory errors 
in the way he processed and analysed 
his data (4). Terman did not reject 
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the material collected by remote re- 
call, but argued against Kinsey’s 
procedure of describing sexual be- 
havior at given ages by lumping the 
reports of respondents actually at 
that age at the time of interview 
with the reports of those who had 
to recall the same period from their 
early lives. 

In the computation of mean frequency of 
masturbation at age 15, for example, the 
memory report of a 50-year-old counts as 
heavily as the report of‘a 15-year-old .... 
It would have been helpful if he had shown in 
the tables what proportion of the N at that 
given age level was accounted for by subjects 
at or near that age (4, pp. 446, 450). 


In the present volume, Kinsey 
again lumps retrospective and cur- 
rent reports, but he has this time pro- 
vided basic tables showing the age 
distribution at time of report, not 
only for his total sample of females, 
but also for particular subgroups. 
The reader can now weigh for himself 
the vulnerability of certain of the 
data in the light of possible memory 
distortions. 

Types of data collected. Critics have 
often concerned themselves with 
Kinsey's definition of the domain of 
sexual behavior and his emphasis on 
orgasm as the unit of measurement of 
such behavior. By and large, Kinsey 
has approached his problem in these 
terms. However, it seems proper to 
distinguish legitimate and explicit 
research limitations from errors which 
merit criticism. Kinsey frequently 
acknowledges that measures such as 
orgasm do not encompass the entire 
domain of the problem, but he argues 
persuasively that such measures were 
the best he could accomplish at this 
time. Stronger criticism, however, 
may be made when conclusions over- 
reach the definition of the domain 
and the character of his data. Kinsey 
may be somewhat vulnerable on this 
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score. In his first volume, virtually 
no direct data were collected on the 
psychological underlying 
the reported overt male sexual be- 
havior, and without such data some 
of the findings were uninterpretable. 
In the present volume, wherein Kin- 
sey seems even more concerned with 
interpreting the underlying meaning 
of his findings, with commenting on 
psychological processes as they dif- 
ferentiate males and females, and 
with drawing recommendations for 
social action, there is all the more 
need for such auxiliary data. Kinsey 
has become acutely aware of this 
need. In contrast to the earlier re- 
port, the present study makes con- 
siderable use of attitudinal and psy- 
chological data collected directly 
from the interview. <A few illustra- 
tions will show how the new psycho- 
logical material is a basis for better 
inferences than the former bare rec- 
ord of overt behavior. Following a 
discussion of homosexual behavior, 
for example, there is a tabulation of 
the degree of regret, if any, that the 
subjects felt concerning their homo- 
sexual experience. Following a dis- 
cussion of masturbation, evidence is 
presented on the degree to which 
psychological disturbance had 
companied the activity. Following 
a discussion of sexual contacts be- 
tween adults and children, informa- 
tion is cited on the psychological 
reactions of the children. 

In addition to new data, 
there is acute use of auxiliary materi- 
als from an immense variety of 
sources in other fields. Anthropologi- 
cal, biological, psychological, medical, 
historical, and literary data on sexual 
behavior are profusely and effectively 
cited to support the findings and in- 
terpretations drawn from Kinsey's 
own original materials. The ex- 
panded use of such supplementary 


processes 


ac- 


these 
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data in this current report appears to 
be a major contribution toward the 
placement of Kinsey's work in per- 
spective. 

Analytic techniques. The analytic 
procedures employed for the female 
report conform closely to the pattern 
established in the first volume. In 
general, sexual behavior is described 
by the incidence and frequency of 
any experience, and experience lead- 
ing to orgasm. These are considered 
under the following defined types of 
sexual activity: preadolescent sex 
play, masturbation, nocturnaldreams, 
heterosexual petting, premarital coi- 
tus, marital coitus, extramarital coi- 
tus, postmarital coitus, homosexual 
contacts, and animal contacts. The 


individual figures for each type of 
activity, and the “total sexual out- 
let’ from all types, are then explained 
by examination of the variations in 


such behavior according to age, 
religion, educational status, marital 
status, etc. These explanatory fac- 
tors are the same ones used in the 
analysis of the male data, and the 
approach proves fruitful, for the 
differences reported from one group 
to another greatly increase our un- 
derstanding of the factors affecting 
sexual behavior. 

These statistical analyses are han- 
died carefully. Kinsey examines his 
findings for spuriousness and system- 
atically controls factors that might 
be correlated with the characteristic 
under study. When he has insuf- 
ficient cases to pursue such analyses, 
he is careful to qualify his conclu- 
sions. Similarly, he is cautious about 
imputing causality. 

Presentation of the detailed break- 
downs by subgroups has additional 
values. First, it enables the reader 
to appraise the probable effect of the 
sampling biases which have been 
noted above. It is on this basis that 
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Kinsey argues that the heavy pre- 
ponderance of college women is not 
overly serious, since, in general, he 
finds little difference among the high 
school, college, and graduate school 
groups. He however, state 
that “‘major changes might have 
heen introduced into the generaliza- 
tions if we had had a larger sample 
of females who had never gone be- 
yond grade school” {italics mine] (p. 
57). 

The use of the same social char- 
acteristics to examine the sexual ac- 
tivity of both males and females en- 
ables the reader to compare not only 
the differences between men and 
women in the aggregate, but also 
differences between men and women 


does, 


within specific subgroups, e.g., edu- 
cated, Catholic, etc. This is obviously 
a much more precise approach to the 
evaluation of what is a major social 
psychological contribution of Kin- 
sey’s second report—the analysis of 
social conflicts which may arise be- 
cause of th> discrepant sexual pat- 
terns of male and female partners. 
In marriage and in social life gener- 
ally, people normally interact with- 
in limited social groupings. Con- 
sequently, the important facts to be 
adduced are the patterns among men 
and women who are likely to meet, 
rather than the patterns for men and 
women in general. 

Although these parallel analyses 
by the same social groupings are 
therefore potentially valuable, Kin- 
sey does not seem to make sufficient 
use of them for their obvious purpose. 
Most of the conclusions about social 


conflicts arising from discrepant sexu- 
al behavior in men and women seem 
to be predicated on the aggregate 


samples. For example, Kinsey finds 
that the total number of men who at 
one time or another engaged in pre- 
marital coitus approximates the level 
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of 90%, while among women the 
figure is 50%. He uses this dis- 
crepancy in discussing the rise of the 
double standard and regards the 
greater sexuality of young males, 
compared with young females, as 
creating social strains. But when the 
incidence of premarital coitus is 
examined for separate educational 
levels, the figure drops to 68% among 
the college-educated men and rises to 
60% among the similarly educated 
women. Within this particular group, 
therefore, the sex difference tends to 
disappear. Unfortunately, Kinsey 
generally tends to rest his conclu- 
sions on the characteristics of the 
total samples. It should also be noted 
that while it is theoretically possible 
to make detailed subgroup compari- 
sons between the sexes, the task is 
extraordinarily difhcult for a reader. 
While parallel use of the same so- 
cial characteristics in the analysis of 
the behavior of both sexes has very 
substantial value, it is strange that 
the female analysis should have been 
so rigidly restricted to those factors 
alone. There are other social char- 
acteristics of American women pre- 
sumably central to their behavior 
which are not comprehended at all 
under the traditional groupings that 
were used for the male. Obvious 
factors in female behavior are the 
presence or absence of motherhood, 
the care of children, and status as an 
independent wage earner as con- 
trasted with the role of dependent 
nonworking wife or daughter. Con- 
ceivably, these factors have no rela- 
tion to sexual behavior, but the pre- 
sumption that they do is strong. It 
is surprising that such analyses are 
not reported in the present volume. 
At the outset, it was emphasized 
that it is not easy to make a simple 
evaluation of Kinsey’s new work. 
The report represents, in this re- 
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viewer's judgment, a_ considerable 
improvement over the first volume. 
Like all investigations, it has limita- 
tions and weaknesses but, as a pio- 
neer investigation of such scope into 
a very difticult research problem, it 
represents an unusual achievement. 


Certainly, one cannot accept the 
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findings as the ultimate, definitive 
answer to problems of sexual be- 
havior, but the discrepancy from this 
research ideal is small. Kinsey's 
new volume should be regarded as a 
monumental contribution for which 
all psychologists can be grateful. 
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Part II 


Tue BrotocicaL DaTa 
Kinsey and his associates present 
in the third and final part of this 
volume a comparison of the sexual 
activity the male and female 
*6 . to discover some of the basic 
factors which account for the similari- 
ties and the differences between the 


of 


two sexes’’ (p. 567). 

The comparison is not restricted to 
the data reported in the first volume 
and in the earlier sections of the pres- 


ent one. Instead, it also includes sup- 
plementary data and a survey of the 


literature. The is selective 
and interpretive, and it represents 
the authors’ conception of the nature 
of sex differences rather than a mere 
identification of what is known in 
this field. By their laborious and 
extensive study of one of the most 
important aspects of human activity, 
Kinsey, Pomeroy, Martin, and Geb- 
hard have earned the interest of a 
wide audience in what they have to 
say about sexual life. Their state- 
ments need not be supported by any 


survey 


Press, 1954, in press. 
3. Parry, H. J., & Cross.ey, H. Validity of 
Publ. 


responses to survey questions. 
Opin, Quart., 1950, 14, 61-80. 

4. TERMAN, L. M._ Kinsey's ‘Sexual be- 
havior in the human male”: some com- 
ments and criticisms. Psychol. Bull., 
1948, 45, 443-459. 


data, provided that their views are 
represented as theory, opinion, or 
conception. However, insofar as 
they claim to have demonstrated the 
validity of their conceptions, the 
reader is entitled to question whether 
or not this is indeed so. 

Their writing is skillful and engag- 
ing, but their style is more popular 
than scientific. They overgeneralize 
from their data. They take cogni- 
zance of opposing viewpoints or con- 
tradicting data, and then proceed 
to advance their own viewpoint 
in complete isolation. A surprising 
number of important terms remain 
undefined, ¢.g., psychological factors, 
basic sex differences, sexual capacity, 
and capacity to be conditioned, 
among others. A lack of conceptual 
clarity the consequence. It is 
necessary to interpret what they 
mean, since the meaning is often not 
evident. A good example is the sum- 
mary statement in their chapter on 
anatomic factors in sexual response 
and orgasm. 


is 


In brief, we conclude that the anatomic 
structures which are most essential to sexual 
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response‘ and orgasm are nearly identical in 
the human female and male. ... If females 
and males differ sexually in any basic way, 
those differences must originate in some other 
aspect of the biology or psychology of the 
two sexes. They do not originate in any of the 
anatomic structures which have 
sidered here (p. 593). 


been con- 


The precise meaning of this state- 
ment is that the erotic topography 
of male and female genitalia will not 
account for the higher reported fre- 
quency of orgasm of male volun- 
teers. 

The authors have singled out differ- 
ence in orgasm frequency as the 
“‘basic”’ sex difference. They could 
have considered other differences as 
“basic,”’ e.g., gross anatomic differ- 
ences, differences in degree of ag- 
gression, or attitudes toward chil- 
dren. They did not, perhaps because 
they believed the most valid data 
they could obtain and interpret were 


reports about orgasms. Initially, 


orgasm was represented as the prime 


criterion of sexual life, but subse- 
quently became identified with all 
of it. 

There are still other difficulties 
with the authors’ account of the basic 
similarities and differences between 
the two sexes, and these will be dis- 
within the context of five 
main propositions. They propose 
that: (a) sexual responses depend 
upon a anatomy which is 
essentially the same in the female 
and the male; (6) the physiological 
accompaniments of orgasm are (with 
minor exceptions) basically the same 
for the two sexes; (c) males are more 
readily aroused by sexual stimuli 
because they have been conditioned 
more frequently than the female; 
(d) males are more frequently con- 
ditioned because certain unidentified 


cussed 


“basic”’ 


* They define sexual response as those physi- 
ological changes which lead to orgasm (p. 594). 
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structural characteristics of the cere- 
bral cortex give them a _ greater 
capacity to be sexually conditioned 
than the female; (e) there are sex 
differences in changes of frequency 
of orgasm with age, the levels of the 
17-ketosteroids correlating with these 
differences. 

They dismiss the contribution of 
anatomic factors to an understanding 
of sex differences in orgasm frequency 
on the basis of the following argu- 
ments: (a) the clitoris is the embry- 
ological homologue of the penis and 
is erotically as sensitive; (6) the va- 
gina is relatively insensitive to touch 
and is therefore erotically unimpor- 
tant; (c) female homosexuals prefer 
clitoral stimulation, and they should 
know what is most stimulating; (d) 
the preferred female masturbatory 
technique involves clitoral rather 
than vaginal stimulation; (e) there is 
no evidence for sex differences in the 
distribution of end organs of touch 
and sensory nerves. 

The argument of the embryologic 
homologue is hardly pertinent. The 
fact that genital differences are diffi- 
cult to identify in the embryonic 
stage is no guarantee that functional 
differences will not emerge at a later 
stage. The bills of the shrike and the 
hummingbird are homologous struc- 
tures, but function in quite different 
ways. 

Their denial of erotic importance 
to the vagina is based, in part, on a 
study in which they asked five gyne- 
cologists to test the tactile sensitivity 
of the clitoris and other parts of the 
genitalia of nearly nine hundred 
women. The gynecologists used a 
glass rod applied lightly to test touch 
sensitivity. According to this study, 
the vagina appeared relatively in- 
sensitive to touch. 

The pertinence of such a study 
hinges on whether the equation of 
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tactile and erotic is acceptable. This 
equation is 
reasons: 

1. Tactile stimulation is always 
available in clothed males and fe- 
males without chronic erotic arousal. 

2. Even tactile sensitivity may 
change when genital tissue is en- 
gorgec by sexual excitement. There 
is no indication that the gynecologi- 
cal exploration was accompanied by 
sexual arousal. 

3. Subjectively, tactile and erotic 
sensations are different. 

The same data are also used to 
contradict the view advanced by 
kreud and others that the processes 
of maturing psychosexually involve a 
shit of the location of the dominant 
erotogenic zone from the clitoris to 
the vagina. This shift the authors 
believe to be biologically impossible 
(p. 584). The submitted evidence 
has questionable relevance since what 
is required is an age comparison to 
determine whether the clitoris re- 
mains the dominant erotogenic zone. 
No age comparison is provided, but 


questionable for three 


rather they cite as additional support 
the preference for clitoral instead of 
vaginal stimulation by women en- 
gaged in homosexual and masturba- 
tory activities. This type of testi- 
mony can only complicate rather than 
clarify a problem in which the critical 
independent variable is psychosexua! 
maturity. 

Their whole discussion of anatomi- 
cal differences in the distribution of 
end organs of touch, of the density 
of sensory nerves in the penis and 
clitoris, in the size of the breast, 
and of other parts of the body has 
no bearing on an understanding of 
the sex difference in orgasm count 
unless these anatomic factors can 


be shown to affect orgasm-seeking be- 
havior differentially. 
tion is implied but not examined. 


This assump- 
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In view of the authors’ belief that 
the vagina is “of minimum impor- 
tance in contributing to the erotic re- 
sponses of the female” (p. 592), and 
since coitus necessarily involves vag- 
inal stimulation, it is curious that 
they did not conclude that anatomic 
factors contribute to sex differences 
in orgasm count. 

In discussing the physiology of 
sexual response and orgasm, they re- 
view certain muscular, vascular, and 
glandular responses before, during, 
and after orgasm. They base their 
review on a survey of the literature, 
references to some of their own inter- 
view data, and what is apparently a 
substantial amount of undocumented 
personal observation. While cited 
studies show rises in pulse rate, blood 
pressure, peripheral blood flow, forced 
respiration, salivary secretions, gener- 
al muscular activity, etc. during or- 
gasm, the evidence bearing on sex 
differences in these responses at the 
human level is variable and negligible. 
The authors conclude that female and 
male are quite alike as far as the data 
vet show in regard to all of these 
changes. While the cited physiologi- 
cal accompaniments of orgasm have 
intrinsic interest, their relevance to 
orgasm-seeking behavior is again as- 
sumed and not critically examined. 

The sex difference in speed of or- 
gasm receives special consideration as 
an attribute of physiological re- 
sponse. They state: 

There is a longstanding and widespread 
opinion that the female is slower than the 
male in her sexual responses and needs more 
extended 
Rasm.... 

Certain it is that many males reach orgasm 
before their wives do in their marital coitus, 


and many females experience orgasm in only 
a portion of their coitus 


stimulation in order to reach or- 


... but our analyses 
now make it appear that this opinion is based 
on a misinterpretation of the facts (p. 625). 


After comparing the speed of mas- 
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turbation of males and females, they 
state that “...the female is not 
appreciably slower than the male in 
her capacity {italics mine] to reach 
orgasm” (p. 626). 

Then they conclude: “But because 
females are less often stimulated by 
psychological factors, they may not 
respond as quickly or as continuously 
as males in socio-sexual relation- 
ships’ (p. 641). 

It is clear from the above that 
Kinsey and his collaborators are not 
disputing the existence of differences 
in speed of sexual response in a 
heterosexual relationship. Rather, 
they deny a difference in “capacity” 
to reach orgasm. Apparently, capac- 
ity is what is left after the influence 
of experience is separated out. This 
view emerges in the following quota- 
tion: 


For instance, the exceedingly rapid re- 
sponses of certain females who are able to 
reach orgasm within a matter of seconds from 
the time they are first stimulated, and the re- 
markable ability of some females to reach 
orgasm repeatedly within a short period of 
time are capacities |italics mine] which most 
other individuals could not conceivably ac- 
quire through training, childhood experience 
or any sort of psychiatric therapy. Similarly, 
it seems reasonable to believe that at least 
some of the females who are slower in their 
responses are not equipped anatomically or 
physiologically in the same way as those who 
respond more rapidly (p. 377). 


High-speed, high-frequency orgasm 
apparently represents a constellation 
of organic qualities which, like germ 
plasm, are relatively immune to the 
However, 


vicissitudes of experience. 
high-speed, high-frequency orgasm 
may be given a variety of psycho- 
logical interpretations including de- 
nial of homosexual feelings, feelings 
of inadequacy, and expressions of 
guilt or exhibitionism, among others. 
Accordingly, the reviewer believes it 
is unwarranted for the authors to 
dismiss sex differences in speed of 
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response by first separating psycho- 
logical factors from the performance, 
and then using speed of masturbation 
as the metric for sexual “‘capacity.” 

Having eliminated anatomical and 
physiological factors (exclusive of 
hormones) as an explanation of sex 
differences in orgasm frequency, the 
authors then ascribe the main role 
to psychological factors. Condition- 
ing carries the heuristic burden. Men 
are more often conditioned by their 
sexual experience (p. 649). This view 
is deduced from a sexual comparison 
of reports of arousal in thirty-three 
situations, These situations include 
observing the opposite sex, one’s own 
sex, portrayals of nude figures, own 
genitalia, etc. In 29 out of 33 of these 
situations, men reported a_ higher 
incidence of arousal than women. It 
is difficult to how many of 
these reported differences reflect a 
greater female sensitivity to our re- 
strictive codes on matters, a 
sensitivity which may have survived 
the two-hour interview with the Kin- 
sey staff. For example, consider the 
four out of thirty-three sets of data 
in which the sex differences either dis- 
appeared or were inverted. A higher 
incidence of women than men re- 
ported arousal from observing moving 
pictures. The incidence of arousal 
was about the same for men and 
women when observing their own 
sex, reading romantic literature, and 
when being bitten. 
appear to the reviewer to be either 
less loaded for women, or are equally 
or more heavily loaded for men. For 
example, to report arousal from ob- 
serving a nude of the same sex im- 
plies an admission of homosexual 
feelings which would be equally un- 
desirable for both sexes. 

Let us assume that the reports are 
uncontaminated by socially deter- 
mined sex differences in what is ac- 


assess 


such 


These questions 
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ceptable sexual behavior and treat 
them as valid data on sex differences 
in arousal. How do the authors 
explain the difference? For them to 
state that the male is conditioned by 
sexual experience more frequently 
than the female is no more than a 
reification of their findings. We need 
to know why they condition more fre- 
quently. Is it due to a socially more 
tolerant attitude to the male when 
he seeks such experience? Is he more 
intensely interested in these experi- 
ences? Does his greater aggression 
lead to a more active expression and 
gratification of sexual interests? The 
explanation of Kinsey and his associ- 
ates does not become obvious until 
they discuss neural factors in sexual 
response where the following state- 
ment appears: 

Since there are differences in the capacities 
of females and males to be conditioned by 
their sexual experience we might expect simi- 
lar differences in the capacities of females and 
males to be conditioned by other, non-sexual 
types of experience. On this point, however, 
we do not yet have information (p. 712). 


Apparently, from the cited state- 
ment, the authors believe that the 


frequency of human male 
conditioning is due to his 
greater “‘capacity”’ to establish an 
association between substitute and 
reliable sexual stimuli, and that this 
capacity might extend to other types 
of reliable stimuli as well. This ex- 
planation (stated as a finding) seems 
farfetched. It assumes that the male 
has a greater capacity for associating 
a class of substitute stimuli which 
are not peculiar to sex (since almost 
any stimulus can be associated with 
sexual response). There is no evi- 
dence for an intrinsic superiority of 
the male in associating in any field. 

There are more plausible explana- 


higher 
sexual 


5 | am assuming that conditioning is a form 
of association. 
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tions. The social code for the human 
female may be more restrictive, she 
may fear pregnancy, she may not 
feel the need for intercourse as often 
as the male, the male may be more 
aggressive—to mention a few. Kin- 
sey and his associates mention these 
possibilities in one connection or an- 
other, but they reject them in favor 
of their “capacity” theory. 

In discussing the locus of the neural 
mechanisms of sexual response, the 
authors point out that components of 
the response pattern, tumescence and 
ejaculation, are possible without sen- 
sation in paraplegics. The organism 
can thus function at the reflex level 
via the spinal cord without cerebral 
intervention. But while the sympa- 
thetic and parasympathetic nervous 
systems are also involved in the nor- 
mal situation, there is no definitive 
evidence concerning the role of the 
hypothalamus. 

The evidence on the role of the 
cortex is contradictory. On the one 
hand, a study on prefrontal lobotomy 
patients 3.3 years before, and another 
3.7 years after psychosurgery, shows 
no appreciable deviation in orgasm 
frequency from corresponding normal 
age and sex groups. On the other 
hand, the authors refer to data which 
show that damage to the cerebrum 
as a whole, and particularly to the 
cortex, may reduce an animal's 
capacity to react to psychosexual 
stimuli. The reduction is directly 
proportional to the extent of damage 
to the cortex. In spite of the contra- 
dictory evidence, they conclude that 
“although the data on the relation 
of the cortex to sexual behavior are 
limited, they do show that this is the 
part of the nervous system through 
which psychosexual stimuli are medi- 
ated” (p. 712). 

There are few performances in- 
deed which cannot be credited, with 
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equal justification, to the cerebral 
cortex. The authors find support for 
their position from the demonstra- 
tion that the male rat’s copulatory 
behavior (and that of other animals) 
is more dependent on an intact cor- 
tex than that of the female. His 
performance requires more direction, 
coordination, and initiative. These 
differences make the male’s copula- 
tory behavior more vulnerable to 
cortical damage, but it does not fol- 
low, even in the rat, that cortical 
differences are the cause of their sex 
differences. 

The final chapter is more carefully 
written. In it the authors address 
themselves to the problems of sex 
differences in the effects of aging on 
frequency of orgasm, and whether or 
not these differences have hormone 
Table 1 condenses and 
reflects the data they provide on age 
trends. 


correlates. 


TABLE 1 


WEEKLY FREQUENCY OF ORGASM FROM 
ALL CAUSES 


Age 15 Age 50 


Single male 1.1 
Single female ; 4 


Age 20 Age 50 
Married male 3.3 1.3 
Married female > Be 8 
What hormone changes correlate 
with these differences? Data on hor- 
mone assays were not obtained from 
Kinsey respondents, but from studies 


reported in the literature. Estrogen 
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and androgen levels do not correlate 
with the Kinsey data. The changes of 
17-ketosteroid levels with age show 
a closer relationship, but there are 
marked deviations in the age range 
from 15 to 27, making a simple causal 
relationship untenable. 

In addition to the hormones men- 
tioned, they consider the effects of 
hormones from the pituitary, the 
thyroid, and others from the adrenals. 
While the latter may affect sexual 
behavior, it is largely as a result of 
their influence on the general meta- 
bolic level. 

They conclude that no hormone 
affects sexual preferences, interests, 
or techniques in a selective way, but 
may rather modify the general level 
of sexual activity. 

To sum up, the conception of the 
nature of sex differences which Kin- 
sey and his associates have elaborated 
is at best awkwardly stated and mis- 
leading in many particulars. It would 
be wise to distinguish sharply be- 
tween the valuable data that they 
collected and their interpretation of 
sex differences. While the authors 
acknowledge repeatedly the impor- 
tance of psychological factors in 
understanding sex differences in or- 
gasm frequency, the constructs that 
they have made central to their pres- 
entation are sexual capacity and con- 
ditioning capacity. Since they think 
of capacity as something devoid of 
experience, we are left with the ques- 
tion, ““What do they mean by psy- 
chological?” 
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Jones, Ernest. The life and work of 
Sigmund Freud. Vol. |. The forma- 
tive years and the great discoveries: 
1856-1900. New York: Basic 
Books, 1953. Pp. XV +428. $6.75. 


This volume is a thorough, careful, 
and erudite account of the origin of 
psychoanalysis, its invention and 
design by the greatest of psycho- 
analysts, and written by the only 
member of his privy council still 
living. The choice of this biographer 
was not only fortunate but almost 
essential, for he is the one remaining 
member of six whom Freud 
chose after the first world war to keep 
alive his radical new orthodoxy. 
Twice Freud had destroyed his let- 
ters, diaries, manuscripts, and notes 
in order to preserve his personal 
privacy, once with a chuckle as to 
how much work he was making for 
his biographer (p. xii). But others 
had kept letters; the elaborate cor- 
respondence with Fliess (1887-1902) 
was available, and the Freud family, 
at first disposed to protect Freud's 
privacy, finally put everything at 
Jones's they 
thought that truth was to be pre- 
ferred to the false legends that were 
growing up about this controversial 
figure. Jones was thorough, as 263 
footnotes and 664 citations of sources 
attest. 

The volume describes, of course, 
only the immature Freud, the in- 
tense, emotional, insecure, frustrated, 
ambitious, impractical Freud, the 
Freud with making his 
place in his profession and in life and, 
as it proved ultimately, in history, 
the impoverished Freud too poor to 
marry his fiancée and for a time sub- 
ordinating other ambitions to achieve 


those 


discretion, because 


obsessed 


that end. This was the Freud of 
1880-1900. Six-sevenths of the 
volume treats of these two decades. 
Two other volumes are promised for 
the four decades that remain. So 
often Jones remarks on the contrast 
between this hard-driving, insecure, 
uncertain man and the wise, assured, 
tolerant, understanding savant who 
was the only Freud his disciples ever 
knew. This first book, however, does 
not show how Freud wrought his own 
maturity by self-analysis from 1897 
on. It discusses the analysis and lets 
the results remain to be seen in later 
volumes, leaving the life far short of 
culmination and this review merely a 
comment on an initial achievement 
whose meaning and value in 1900 
were to be assessed by events that 
were yet to come. 

It is not easy to perceive the whole 
Freud for any one year within these 
two intense decades. The biographer 
has chosen to devote separate chap- 
ters to different themes with greatly 
overlapping periods. Six chapters, 
for instance, deal respectively with 
six more or less synchronous histories. 

1. First there is Freud's brief medi- 
cal career, following his unduly pro- 
longed study as a medical student, 
the period in which he acquired am- 
bition (1882-1885). Actually it was 
intense marry that 
awoke professional aspiration in him, 
but Freud's relation to his fiancée is 
left for a later chapter. 

2. Then there is Freud’s discovery 


his desire to 


of the medical properties of cocaine 


and its value for internal use as an 
analgesic and euphoric (1884-1887). 
Incorrectly Freud, filled with 
thusiasm for his new finding, ad- 
vertised cocaine as not habit form- 
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en- 
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ing, and lived to regret this ill con- 
his impetuosity. He 
missed, however, making the greatest 


sequence ol 


discovery, the use of cocaine exter- 
nally by injectionas a local anesthetic. 
He had handed the preliminary in- 
dications of this application over to 
someone who presently and 
quite properly got the credit. Freud 
seems to have been blocked in push- 
ing this line of research, perhaps be- 
cause he had no taste for manipula- 
tory experimentation or perhaps 
because his desire to see his fiancée 
after a long absence won out over his 
staying in Vienna to prosecute these 
obvious experiments that would, as 
it turned out, have advanced his 
career. 

3. Another chapter in the book 
deals with Freud's betrothal, per- 
sistently frustrated by his poverty 
(1882-1886). His fiancée could ex- 
pect no sizable dowry, and Freud 
seems at the first to have been driven 


else, 


to professional work in order to get 


enough money to marry. Again and 
again indigence prevented him from 
visiting his betrothed or being able 
to give her presents. When you read 
of these years in this chapter, you 
could think that wishing and plan- 
ning for marriage and writing love 
letters took up all of Freud’s time, 
but you would be wrong. He was 
pursuing half a dozen other activities 
simultaneously. 

4. The chapter on Freud’s mar- 
riage (1886) shows how need reduc- 
tion quickly brought satisfaction, and 
thereafter parenthood (six children) 
and a ready lifelong monogamy. One 
cannot help believing that Freud 
was strengthened in support of his 
own theory of sexuality and in with- 
standing the odtum sexuale that was 
directed toward him, because his own 
sexual desires were essentially con- 
ventional, both as to monogamy and 
as to male dominance. 
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5. The chapter on the neurological 
period depicts Freud at first still 
convinced that histology and mor- 
phology are the best biological disci- 
plines, the most scientific, a view due 
in part to the influence first of Briicke 
upon Freud, and later of the his- 
tologist Meynert (1883-1897). Freud 
was by early preference a physicalist. 
He preferred organic to functional 
explanation and disliked what was 
in those days called psychology. 
Briicke was one of the four pupils of 
Johannes Miiller who had declared 
themselves vehemently against vital- 
ism in the 1840's. Helmholtz was 
another, and Jones throughout this 
book speaks of the influence of the 
“Helmholtz directing 
Freud away from the vagaries of 
psychology toward the hard facts 
of physiology and the harder ones of 
anatomy. Anatomy, however, did 
not work out then any better than 
now to provide an understanding of 
human thought and conduct, and 
Freud's writing of his volume on 
aphasia in 1891, after he had been 
exposed to the magic of Charcot in 
1885 and had transferred his loyalty 
from Meynert to Charcot, witnessed 
his acceptance of functional explana- 
tion when organic factors could not 
be found. Later, of course, he turned 
to psychology, still recognizing the 
values of the Helmholtz school by 
characterizing the new psychological 
principles as ‘‘mechanisms.” 


school” as 


6. Then there is Freud's long and 
intimate association, both personal 
and scientific, with Breuer, fourteen 
vears his senior. Breuer believed in 
Freud. He lent him money that was 
seldom repaid. Both of them were 
investing Breuer’s money, Freud be- 
lieved, in the promising enterprise of 
Freud's thought and research. At 
first the two men were associated in 
work, and it was Breuer who dis- 
covered the cathartic method and 
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thus in a way began psychoanalysis. 
Later Breuer, frightened by the phe- 
nomenon of transference in a woman 
patient, escaped from the field, leav- 
ing Freud, who had greater persist- 
ence and no sense of guilt when pa- 
tients fell in love with him, to carry 
on. Much later, when Fliess had re- 
placed Breuer as the father-image in 
Freud's life, Freud turned to bitter 
criticism of Breuer in his letters, but 
he never attacked him in print. 

So there you are. If you pick out 
the vear 1885, for instance, you find 
Freud eating his heart out to get 
married, yet with time and energy 
left for hospital work, for his cocaine 
research, for his study of the neurol- 
ogy of the medulla, for his work with 
Breuer after the cathartic method 
had become available. Any other 
year is equally full. Jones, it seems to 
me, fails to make you feel the inten- 
sity and range of Freud's avidity; 
yet you can figure it out for yourself 


if you synthesize the chapters. 


The immature Freud needed a 
father-image, someone to whom he 
could defer and whose approval and 
support he could win. He had three 
in succession during his professional 
life: Briicke, Breuer, and _ Fliess. 
After that his self-analysis rendered 
him strong enough to get along with 
disciples only. 

It was from Briicke, rigorous physi- 
ologist and neurologist, that Freud 
got the faith in neurological determin- 
ism which later in the Traumdeutung 
of 1900 he turned into psychic deter- 
minism. 

Breuer was the faithful admiring 
friend and collaborator. He gave 
Freud much and required little. He 
could not, however, bring himself to 
continue collaboration, and _ later, 
as Freud’s theory of sexuality began 
to dominate his professional thought, 
a coolness developed between the 
two. 
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After Breuer, Fliess! Nowadays we 
should call Fliess a brilliant numero- 
logical crank. He believed passion- 
ately that all living matter, even the 
individual cell, is bisexual, that living 
nature is based on periodicity, and 
that the basic periods are 28 (the 
female period, the menstrual cycle) 
and 23 (the male period, the interval 
between menstrual cycles). Freud, 
no mathematician, was captivated 
by Fliess’s ability to use mathe- 
matics to compute conclusions out 
of these basic numbers, but actually 
what the two gave each other was 
mutual admiration and the self-confi- 
dence that goes with appreciation and 
the appearance of understanding. 
They wrote to each other at least 
once a week, criticized each other's 
manuscripts, got together in iimpor- 
tant special meetings (they called 
them ‘‘congresses’’) whenever they 
could. Freud played the dependent 
role, and, when the break came later, 
Fliess initiated it. This was an in- 
tense 15-year friendship (1887-1902), 
which illustrates well the operation 
of human needs but does little to 
dignify the character of Freud. No 
wonder he destroyed his half of this 
correspondence after his self-analysis. 
Freud's to Fliess were saved.! 

You have to do a little guessing and 
a little reasoning to find out how 
Freud worked and what it was he 
called work. He made “discoveries,” 
and the subtitle of the book is “The 
Formative Years and the Great Dis- 
coveries."” You know that he worked; 
that he worked hard; that he worked 
best when he did not feel well; that 
the intensity of his work often wore 
him out; that his successes were 
capricious, for there were long times 


‘And have been published since this re- 
view was written. FREUD, SIGMUND, The ori- 
gins of psychoanalysis: Sigmund Freud's letters 
to Wilhelm Flic New York: Basic Books, 
1954. $6.75. 
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when no insight would come and then 
suddenly a particular problem would 
find itself resolved. He seems ir- 
regularly to have had these insights, 
then to have formulated hypotheses, 
and then to have adjusted and re- 
shaped them as he kept applying 
them to his patients in analysis and 
in former analyses. He seems almost 
never to have accepted a new hy- 
pothesis quickly, but only slowly and, 
as it were, reluctantly, after much 
checking. Nor did his accepted hy- 
potheses always stay right. ‘There 
was Freud's belief about adult hys- 
terical symptoms’ always 
their roots in infantile sexual experi- 
ence. What at first seemed to con- 
firm this belief was the constant re- 
port in analyses of incestuous rela- 
tions between parents and children. 
Freud, as well as Breuer, had diffi- 
culty in accepting this finding as a 
fact, and then suddenly in 1897 
lreud realized that these attempted 
seductions were nothing more than 
had never 
“discovery” 


having 


patients’ fantasies that 
That 
was a blow to Freud, for it seemed to 


really occurred. 


half 
psychoanalysis, leaving intact only 
the theory of dreams as wish-fulfill- 
Actually, as it turned out, the 
“discovery” strengthened the theory 
of infantile sexuality, weakening only 
the false belief in parental sexuality. 

Although Freud usually was slow 
these insights, once in a 


knock out of the structure of 


ment. 


to accept 
while an insight was sudden enough 
to be dated. One such 
indicated by Freud's suggestion, in a 
moment of gaiety, that there should 
be erected a marble tablet at the 
table near the northeast corner of the 
terrace of Bellevue Restaurant in 
Vienna. The tablet was to read: 
“Here the secret of dreams was 
revealed to Dr. Sigm. Freud on July 
24, 1895” (p. 354). 


instance is 
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Freud's technic was not, of course, 
the experimental method. He him- 
self remarked in 1900: “I am not 
really a man of science, not an ob- 
server, not an experimenter, and not 
a thinker. I am nothing but by tem- 
perament a conquistador—an _ ad- 
venturer” (p. 348). And at another 
time: “It seems to be my fate to dis- 
cover only the obvious: that children 
have sexual feelings, which every 
nursemaid knows; and that night 
dreams are just as much wish-fulfill- 
ment as day dreams” (p. 350). On 
the other hand, one should not take 
Freud’s modesty and momentary 
self-depreciation too literally. He 
thought of himself as a discoverer, 
and those insights that staved valid 
were the result of long and laborious 
thinking within the monotony of 
psychoanalyses and their repeated 
review. Freud's technic lay some- 
where between that of the experi- 
mental psychologist, who alters an 
independent variable and observes 
the result, and the philosopher-psy- 
who induces generaliza- 
tions about human nature from the 
reservoir of his experience. Freud 
made his generalizations from a 
wealth of specialized experience and 
then tested his hypotheses out against 
particular cases, increasing his as- 
surance about the validity of each 
induction as the number of consistent 
cases grew. He had, however, no 
control, either in the sense of the 
rigorous constraint of contributing 
factors or in the sense of adding the 
method of difference to the method of 
agreement. Indeed he seems to have 
been restricted to Mill’s method of 
agreement, pure and simple, a meth- 
od which by itself is clearly unsafe. 

Was Freud original? He fitted his 
Zeitgeist, of course, as all great men 
Briicke influenced him in re- 
pect of determinism, and Meynert in 


chologist, 


do. 
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respect of active ideas and their re- 
pression, for Meynert was an admirer 
of Herbart, who in these matters is 
clearly an intellectual ancestor of 
Freud's. Jones mentions as also in- 
fluential Fechner and Griesinger, both 
of whom derived many of their 
‘deas from Herbart. Breuer’s influ- 
ence we have already noted. Charcot 
captured Freud’s admiration when he 
visited him in 1885 and later Bern- 
heim became a lesser influence. Char- 
cot it was who led Freud to shift his 
allegiance from the organic to the 
functional and thus away from Mey- 
nert. Jones does not find that Bren- 
tano influenced Freud, although 
Freud knew him, sat under him in 
lectures, translated a volume of J. S. 
Mill at his recommendation. There 


is no relation between Janet and 
Freud except that both were affected 
by the same Zeitgeist and by Charcot. 

Yet all this fitting of Freud into 
his intellectual genetics is not very 


important. Certainly Freud was 
vastly original. His conceptions of 
infantile sexuality and of the psy- 
chological mechanisms are not to be 
found in Charcot or elsewhere, and 
the importance of his originality is to 
be measured by its effects. In this 
reviewer's judgment Freud is a great 
man because he influenced thought 
and civilization more than any other 
person whom psychology might claim 
in the last three hundred years. A 
Tolstoy might have shown him as 
more the agent of an_ irresistible 
Zeitgeist than as a great originator, 
and still he would be a Great Man, 
for that is what Great Men are. 
Kings are History's slaves. That the 
kings who have commanded the 
choices progress should make are 
singled out by posterity to be called 
great is the proof of man’s essential 
insecurity and his need to find safety 
in pride and reverence for others than 
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himself. Freud would have recog- 
nized such a fact without losing his 
faith in Helmholtzian determinism; 
he would have believed that greatness 
is not diminished by its having 
causes. 
EpwIn G. BorINnc. 
Harvard University. 


FLoyp, W. F., & WeELForp, A. T. 
(Eds.) Symposium on fatigue. 
London: H. K. Lewis & Co., 1953. 
Pp. vii+196. 24s. 

This report of both English and 
American workers is concerned with 
various aspects of inadequate per- 
formance of the human, and tech- 
niques for detecting and measuring 
them. Both old and new slants are 
represented. While the symposium 
is labeled as one on fatigue, no verbal- 
ized convention is agreed upon as to 
what is fatigue and what is to be 
otherwise defined. In some cases, 
the participants admit that they are 
not really reporting on fatigue but 
rather upon decreasing performance 
(R. C. Browne). Some tacitly imply 
that fatigue is performance decre- 
ment (R. M. Gagné), some explicitly 
state it (W. T. Singleton). A. T. 
Welford explicitly recognizes that 
there are several distinctly different 
phenomena called fatigue. One is a 
subjective state related to some kind 
of physical or mental strain. The 
other is a decrement in performance 
following more or less prolonged ac- 
tivity. He states that the psycholo- 
gist takes a position ‘“‘between the 
two.”’ This is to say that the psychol- 
ogist must “keep in touch with the 
man in the street’’; he must ‘build 
his theories in such a way that they 
can contain the experiences and be- 
havior of ordinary people.”” Welford 
further states that “it is to physi- 
ology that he must look for explana- 
tions.”’ 
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Others try to connect fatigue with 
learning and drives as defined in 
prevalent learning theory (D. R. 
Davis). It seems as though learning 
deserves psychological theorizing but 
that fatigue does not. T. A. Ryan 
deals very ably with the factor of 
effort in performance and thus comes 
closest to saying that fatigue is a 
proper study for psychologists along 
side other phenomena such as anxi- 
ety. Though he did not, Ryan might 
have said that since he found effort 
definable only in experiential terms, 
the sine qua non of fatigue is an identi- 
fiable experience, and at the same 
time a respectable object of scientific 
inquiry. 

Nowhere throughout the printed re- 
port of the symposium does there 
seem to be a recognition that the 
topic of fatigue is in an undefined and 
therefore unusable state for intelli- 
gent discussion and inquiry. There 
was no recognition of the logically 
intolerable condition of having a 
number of terms used synonymously 
in technical discourse; e.g., fatigue, 
boredom, tiredness, work decrement, 
disorganization of performance, and 
impairment. Because of this defi- 
ciency and what follows from it, all 
these very careful and ingenious 
studies lack the orientation and 
meaningfulness that they would 
otherwise have. 

It is to be hoped that the need to 
systematize knowledge and informa- 
tion, and to orient ourselves by sys- 
tematic thinking, will soon be recog- 
nized to be as urgently required as 
the continued collection of more data. 
To close with a positive note, it can 
be said that the reader will find in 
the symposium a number of ingenious 
experiments that are well worth his 
acquaintance. 

S. HowarD BARTLEY. 

Michigan State College. 
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HOvVLAND, CARL I., JANIS, [RvtInG L., 
AND KELLEY, HAROLD H. Com- 
munication and persuasion. New 
Haven: Yale Univer. Press, 1953. 
Pp. xii+315. $4.50. 

This volume brings together the 
findings of the Yale Communication 
Research Program, a program de- 
voted to the study of the modification 
of attitudes and opinions through 
communication by means of con- 
trolled experiments. The project was 
set up as a cooperative research and 
study group rather than in a central- 
ized, hierarchically organized form. 
Approximately 30 individuals con- 
tributed to the work, each being en- 
couraged to pursue his own interests 
and theoretical bent. 

Ideal as this plan may be from 
many points of view, it clearly has 
made the job of reporting a difficult 
one. The researches are uneven in 
quality, and stem from a variety of 
theoretical approaches, including, 
among others, Hull's learning theory, 
some motivational hypotheses of 
Freud and other psychoanalysts, and 
some of the formulations of Lewin, 
Sherif, Newcomb, and 
cerning the effects of group member- 
ship. The authors of Communication 
and Persuasion, therefore, adopted a 
structural organization, using topical 
headings of (a) the communicator, 
(6) content of the communication, 
(c) audience predispositions, includ- 
ing group conformity motives and 
individual personality factors, and (d) 
responses, including overt expression 


others con- 


of opinion and retention of opinion 
change. In the light of the formidable 
organizational task they faced, the 
authors are to be congratulated on the 
book’s structural unity, ease of tran- 
sition, and readability. 

The experiments themselves deal 
almost exclusively with the effects of 
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one-way communication on “cap- 
tive’’ audiences, opinion change and 
retention being assessed by means of 
questionnaires. The resulting limi- 
tations are obvious, but since most of 
the questionnaires were carefully 
worked out with numerous cross 
checks, and since all the experiments 
are primarily directed toward basic 
theoretical rather than immediately 
practical aims, it seems likely that 
many of the generalizations will hold 
for more complex communication 
situations evaluated in other ways. 
The probability of this being so is 
increased by the fact that the authors 
have made a commendably thorough 
review of the relevant literature, 
which is presented for each experi- 
ment or set of experiments as the 
context within which the new find- 
ings are considered. In addition, 
most of the experimenters have made 
admirable use of advanced experi- 
mental designs which allow the assess- 
ment of numerous variables at once, 
both singly and in interaction. 

In summary, this book is an ex- 
cellent report of studies that make 
a substantial contribution to an im- 
portant area of social psychology. It 
is a source of many experimentally 
derived, thought-provoking general- 
izations concerning the processes of 
communication and opinion change. 
These generalizations, however, tend 
to be quite limited, specific, and ten- 
tative, thus reflecting the present 
state of knowledge in the field, the 
over-all eclectic nature of the research 
reported, and the authors’ preference 
for sticking close to the available 
data. The volume also provides an 
excellent review and analysis of a 
large area of communication litera- 
ture, as well as a number of valuable 
suggestions for future research. 

F. P. KILPATRICK 

Princeton University. 
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BINGHAM, WALTER VANDyKE. [lomo 
saptens auduboniensis. New York: 
National Audubon Society, 1953. 
Pp. 39. $1.10. 


A significant, if not the most con- 
spicuous, phase of the late Walter 
Bingham’s qualities receives a side- 
light from the address ‘‘Homo Sapi- 
ens Auduboniensis,”’ which he gave 
in October 1937 at a dinner of the 
Audubon Camp held at the American 
Museum of Natural History. The 
address gives its title to a recent 
memorial volume of which it is the 
nucleus. 

Following a brief, touching fore- 
word in the name of the Audubon 
Society, there is a memoir by Walter 
Bingham's wife, Millicent Todd Bing- 
ham, which embodies the major por- 
tion of the 39-page text. Its title, 
“Beyond Psychology,” is not invidi- 
ous to our discipline, meaning rather, 
“What should they know of Bingham 
who only Bingham knew” for his 
professional accomplishment, hav- 
ing but fragmentary perception of his 
various reachings into the humanities 
and the arts, particularly music; not 
to say the depth and breadth of good 
will he was so apt in implementing, 
his steady cheeriness, his hard prac- 
tical sense, his effectiveness in physi- 
cal emergency. 

Then come the five pages of the 
“Homo Sapiens Auduboniensis”’ it- 
self. The pages that one of our group 
would least want to miss are the two 
at the volume’s close, a (1948) letter 
of Bingham's to the able and ener- 


getic director of the Audubon Camp, 


Carl Buchheister. It concerns the 
lag of social behind material culture 
and something to do about it; com- 
pare also the note on an earlier letter 
to Clark Wissler (pp. 26-27). The 
address itself, as implied in its title, 
is mainly a whimsical characteriza- 
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tion of Audubon Camp participants, 
phrased in natural history terms. For 
Bingham's colleagues, 
this theme has perhaps less intrinsic 
interest than other portions of the 
volume, though towards the close he 
emerges from the role-taking more 
fully. The remarks have a good deal 
more interest for the psychographer, 
for the part that humor played in 
Bingham’s personality, a sort of test- 
ing the limits of facetiousness that 
he would allow himself. 

What was the role in Bingham’s 
personality of the “humor” that Mrs. 
Bingham simply mentions (p. 15) 
after rather discounting it on the 
preceding page? In humor’s ethically 
highest reach,' Bingham was at home 
with the best minds. In the other 
levels, more connoted by “sense of 
humor,”’ he was progressively less at 
home. Fun-loving would hardly de- 
scribe him, though he was ready 
enough that others should be so. In 
the stereotype of the salon he would 
have passed for a relatively humorless 
man. Still less was he one to deal in 
the Joke Proper; and Flippancy 
tended to disgust him. At any level, 
the humor of “Wit"’ was not Bing- 
ham’s cup of tea; that of Joy em- 
phatically was. 


professional 


1 |. Joy, Fun, the Joke Proper, and Flip- 
pancy. You will see the first among friends 
and lovers reunited on the eve of a holiday. 
Something like it is expressed in much of 
detestable art which the humans call 


that 
Music. ..." (C.S. Lewis, The Screwtape Let- 
ters, p. 57). 
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The ego-defense dynamic of wit 
seems well established, though it 
matters much what it is a defense 
against; compare Abraham Lincoln 
with Jonathan Swift. In a character 
like Bingham there is little need for 
such defense, and it does not develop 
readily or far. To a whimsicality 
like ‘‘Homo Sapiens Auduboniensis”’ 
he could bring himself; satire would 
have been mostly beyond him, or 
beneath him, as you please. Try to 
imagine Walter Bingham composing 
a Lilliput, let alone a Modest Proposal. 

Bingham’s professional colleagues 
may well thank the National Audu- 
bon Society for their graceful tribute, 
and for the light it throws on this 
comparatively unconsidered facet of 
Bingham’s rich personality. Beyond 
this, of course, looms the whole 
question of the role of nature study in 
the benign socialization of the indi- 
vidual. It can be a negative one. How 
much did love of nature help to 
make Bingham what he was? How 
much did Bingham, being what he 
was, have to respond to nature in 
this way? We are not likely to know. 
What we do know is that this love 
of nature had no small part in one of 
the best and wisest men that ever 
made psychology his profession; that 
he had a lively and steadfast faith 
in its benign influence on others, and 
in its communicability. 


FreD L. WELILs. 
Newton Highlands, Massachusetts. 
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